Parallel programming Practical 2012/2013
The information on this website is obsolete! The updated information is available on the blackboard.
Note that starting in November 2012, all the assignments for the PPP course should be tested on DAS4. More information is available on the PPP page on the Blackboard.
The Distributed ASCI Supercomputer 3
This section lists some information on the DAS-3 system used for the practical. For more information on the DAS-3, see The Distributed ASCI Supercomputer 3 (DAS-3) website
The system that you should use to run and test the parallel applications is called the DAS-3. It is a wide-area distributed cluster designed by the Advanced School for Computing and Imaging (ASCI). For this course, we only use one cluster, located at the VU. Each node contains:
The nodes within a local cluster are connected by a Myrinet network, which is used as high-speed interconnect, mapped into user-space. In addition, Gigabit Ethernet is used as OS network (file transport).
To run and test parallel applications on the DAS, you have to login on the fs0.das3.cs.vu.nl (the DAS-3 fileserver) first. You may use:
localhost prompt> ssh fs0.das3.cs.vu.nl
Ssh (Secure Shell) automatically sets your display environment variables. Note that you are only able to logon the fs0 from within the faculty network.
Loading modulesTo use the DAS-3 you must setup your environment using the module command. Add the following lines to the .bashrc script:
module load default-myrinet module load mpich/mx module load java/1.6-amd64 module load ant
Running parallel applications
Compiling your parallel program is different for each programming language, so this is described per assignment. To execute a program in parallel on the DAS, use the prun command to reserve CPUs. Its syntax is as follows:
prun [prun options] executable cpus [application command-line options]
So, for example, to run your a.out program on 4 DAS nodes, you should do:
DAS prompt> prun -v -1 ./a.out 4
The -1 option is used to reserve a whole machine, and not just a processor (remember that the DAS consists of SMP nodes). If the -1 flag is omitted, prun might run your job on two machines, using two processors per machine. For the performance measurements in this course, you should always use the -1 option. After the number of CPUs, command-line options can be entered that are passed on to your application.
Whenever you want to terminate a running application, you can simply press ctrl-c. The command pkill allows one to kill zombie processes. It is probably best to run pkill every once in a while to make sure no leftover processes are running on the DAS, which may hurt performance measurements of other DAS users. The preserve command can also be used to kill jobs.
Using the -v flag with prun will cause it to display more verbose information, giving some information on the availability of the requested DAS nodes. For MPI applications, this is required for the run script used (see the MPI assignment for more details). However, for the Java assignment this is not needed, and can be turned off with the -np flag.
You can also use the command
DAS prompt> preserve -list
in order to obtain additional info on the current status of the DAS processors. To learn more about prun, preserve and pkill, examine their manual pages with man and please read this section.
Compiling and running GPU-accelerated applications
To be able to use the CUDA compiler and/or find the OpenCL libraries, add the following lines in your .bashrc (note that you need to logout and login again for these changes to take effect):
export CUDA=/usr/local/package/cuda-3.2.9/cuda export PATH=$PATH:$CUDA/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA/lib64
To compile your GPU code, there are different compilers depending on your choice of language. Thus, for Cuda, your programs you should use the NVIDIA Cuda compiler nvcc. If you choose to use OpenCL, you can just use gcc - for example:
g++ -I/usr/local/package/cuda-3.2.9/cuda/include -Wall -m64 -g -lOpenCL opencl-sobel.cc -o opencl-sobelSimple Makefiles are provided for your convenience, for both CUDA and OpenCL.
If you want to run Cuda or OpenCL programs on the DAS, you have to use prun like this:
prun -v -np 1 -q gpu.q /[path_to_file]/cuda-sobel /[path_to_file]/image_01.bmp
DAS3 has 8 NVIDIA GT430 GPUs. The GT430 has 3 streaming multiprocessors, with 32 cores each. Thus, in total, the GPU has 96 cores. The chips are based on the Fermi architecture, and provide compute capability 2.1. They have 1 GB of device memory. The exact specifications are available here .
Running your applications without prun will start your application on the DAS-3 fileserver (fs0.das3), and will NOT run the application in parallel. Also, since many other DAS users are working on fs0.das3, running a heavy program there will hurt general system performance and is considered to be abuse of the system! When testing a strictly sequential program, you should run it with prun -v -1 ./a.out 1 for the same reason. More importantly, the fs0.das3 fileserver simply cannot be compared to one individual DAS node because both are completely different computers, making your timing measurements between the two worthless.
Please comply to the DAS-3 Usage Policy.