Grid computing on DAS-2

One of the main research areas for DAS-2 is Grid computing. For this purpose, the Globus toolkit has been installed on all DAS-2 clusters.

The required Globus certificates can be requested from the DutchGrid authority; you are advised to request a "medium-security" certificate, since that is also usable outside of DAS-2. Once you have obtained your certificate and installed it in your .globus directory on DAS2, you can send mail to das-sysadm@cs.vu.nl asking for inclusion of your certificate in the DAS2 Globus configuration files. After that, you should be able to submit jobs with the Globus tools to all DAS2 clusters.

Examples on how to start using Globus on DAS-2 are currently available via the DAS-2/Globus information page at the UvA. Note that the currently installed release of Globus on DAS-2 is version 3.2 (pre-webservices), which is in /usr/local/globus/globus-3.2 (also set in environment variable $GLOBUS_LOCATION). Documentation for this release is available from the Globus site.

For general questions about Globus on DAS-2, please contact das-sysadm@cs.vu.nl.
For questions specifically related to Globus user certificates, please contact support@dutchgrid.nl with a cc: to das-sysadm@cs.vu.nl.

Running an MPI job on multiple clusters

The MPI implementation that is used in a Globus environment is MPICH-G2.

NOTE: The current version is now able to use the fast local DAS-2 interconnect (Myrinet) on the local clusters; only communication between clusters goes over TCP/IP sockets. The performance is highest when MPICH-G2 can be certain that communication is local to the cluster; use of a MPI_ANY_SOURCE in MPI_Receive may still cause a (slow) poll of the TCP/IP sockets, since messages might then come from remote cluster sources as well.

Running applications on multiple DAS-2 clusters requires a bit of work. We will show how to do it by means of a simple MPI example, the same that was used on the DAS-2 PBS/Prun information page.

NOTE: To use the Intel Fortran compiler with MPICH-G2, some details are slightly different; this is described on a separate MPICH-G2/Intel page.

Step 1: Make sure the MPICH-G2 (Globus) on MPICH-GM (Myrinet) is in your path before other MPICH-based implementations


[versto@fs0 MPI]$ which mpicc
/usr/local/mpich/mpich-gm/bin/mpicc
[versto@fs0 MPI]$ PATH=/usr/local/mpich/mpich-g2-gm-gcc/bin:$PATH
[versto@fs0 MPI]$ which mpicc
/usr/local/mpich/mpich-g2-gm-gcc/bin/mpicc

Note: If for some reason (e.g., performance comparisons) you want to use IP-based communication everywhere (i.e., no use of native Myrinet-communication within the DAS-2 clusters) you should instead use MPICH-G2 version mpich-g2-ip-gcc.

Step 2: Compile the code with MPICH-G2


[versto@fs0 MPI]$ mpicc -o cpi_globus cpi.c

Step 3: Create a "machines" file that specifies the cpus to be used


[versto@fs0 MPI]$ cat machines 
"fs0.das2.cs.vu.nl/jobmanager-sge" 4
"fs2.das2.nikhef.nl/jobmanager-sge" 4

Step 4: Transfer the binary to the other DAS-2 sites, in the same directory


[versto@fs0 MPI]$ scp cpi_globus fs2:`pwd`
cpi_globus           100% |*****************************|   827 KB    00:00    

Step 5: Create a Globus "RSL" file, based on the "machines" file


[versto@fs0 MPI]$ mpirun -dumprsl -np 8 cpi_globus arg1 arg2 >cpi_globus.rsl
[versto@fs0 MPI]$ cat cpi_globus.rsl 
+
( &(resourceManagerContact="fs0.das2.cs.vu.nl/jobmanager-sge") 
   (count=4)
   (jobtype=mpi)
   (label="subjob 0")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
                (LD_LIBRARY_PATH /usr/local/globus/globus-3.2/lib/))
   (arguments= "arg1" "arg2")
   (directory="/home2/versto/Projects/Globus/MPI")
   (executable="/home2/versto/Projects/Globus/MPI/cpi_globus")
)
( &(resourceManagerContact="fs2.das2.nikhef.nl/jobmanager-sge") 
   (count=4)
   (jobtype=mpi)
   (label="subjob 4")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 1)
                (LD_LIBRARY_PATH /usr/local/globus/globus-3.2/lib/))
   (arguments= "arg1" "arg2")
   (directory="/home2/versto/Projects/Globus/MPI")
   (executable="/home2/versto/Projects/Globus/MPI/cpi_globus")
)

Note: If MPICH-G2 over IP/sockets is used, the clause "(jobtype=mpi)" will not be included in the above RSL, since that is used to request a special startup procedure for subjobs which is only suitable for use with the native MPI (i.e., MPICH-GM/Myrinet on DAS2).

Step 6: Edit the "RSL" if needed

In the version below we extend the RSL specification as follows:

Note that the modifications described here are only an example; the application discussed only runs a very short time so the default would have been fine. Also, the application does not require any additional environment parameters.


[versto@fs0 MPI]$ vi cpi_globus.rsl 
[versto@fs0 MPI]$ cat cpi_globus.rsl 
+
( &(resourceManagerContact="fs0.das2.cs.vu.nl/jobmanager-sge") 
   (count=4)
   (jobtype=mpi)
   (label="subjob 0")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
                (MY_CLUSTER_ID 0)
                (LD_LIBRARY_PATH /usr/local/globus/globus-3.2/lib/))
   (arguments= "arg1" "arg2")
   (maxWallTime=15)
   (directory="/home2/versto/Projects/Globus/MPI")
   (executable="/home2/versto/Projects/Globus/MPI/cpi_globus")
)
( &(resourceManagerContact="fs2.das2.nikhef.nl/jobmanager-sge") 
   (count=4)
   (jobtype=mpi)
   (label="subjob 4")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 1)
                (MY_CLUSTER_ID 1)
                (LD_LIBRARY_PATH /usr/local/globus/globus-3.2/lib/))
   (arguments= "arg1" "arg2")
   (maxWallTime=15)
   (directory="/home2/versto/Projects/Globus/MPI")
   (executable="/home2/versto/Projects/Globus/MPI/cpi_globus")
)

Step 7: Make sure your Globus "proxy" exists and is still valid


[versto@fs0 MPI]$ grid-proxy-info -timeleft
ERROR: unable to determine proxy file name
[versto@fs0 MPI]$ grid-proxy-init 
Your identity: /O=dutchgrid/O=users/O=vu/OU=cs/CN=Kees Verstoep
Enter GRID pass phrase for this identity:
Creating proxy ........................................ Done
Your proxy is valid until Thu May 30 23:31:37 2002
[versto@fs0 MPI]$ grid-proxy-info -timeleft
43198

Step 8: Run the job


[versto@fs0 MPI]$ mpirun -globusrsl cpi_globus.rsl 
Process 0 on node071.das2.cs.vu.nl
Process 1 on node071.das2.cs.vu.nl
Process 2 on node070.das2.cs.vu.nl
Process 3 on node070.das2.cs.vu.nl
Process 4 on node231.das2.nikhef.nl
Process 5 on node231.das2.nikhef.nl
Process 6 on node230.das2.nikhef.nl
Process 7 on node230.das2.nikhef.nl
pi is approximately 3.1416009869231249, error is 0.0000083333333318
wall clock time = 0.011847

Much more details are available from the MPICH-G2 information page.
Advanced School for Computing and Imaging

Back to the DAS-2 home page
This page is maintained by Kees Verstoep. Last modified: Mon Apr 4 14:29:11 CEST 2005