MPI Grid jobs using OpenMPI/TCP on DAS-3
OpenMPI is highly configurable MPI implementation, offering many
runtime options to specify its exact runtime behavior.
For details, please consult the
OpenMPI website.
OpenMPI example:
- Step 1: Compile the code with OpenMPI instead of MPICH:
$ module load default-myrinet # or "default-ethernet" on DAS-3/Delft
$ module del mpich
$ module add openmpi
$ module list
Currently Loaded Modulefiles:
1) mx/64/1.2.0j 4) prun/default 7) openmpi/gcc/default
2) sge/6.0u8 5) globus/4.0.3
3) cluster-tools/2.0.5 6) default-myrinet
$ which mpicc
/usr/local/package/openmpi-1.2.1/bin/mpicc
$ mpicc -o cpi_openmpi cpi.c
-
Step 2: reserve compute nodes on the clusters you want to use.
In this case we assume we have already reserved two nodes on both
DAS-3/VU and DAS-3/Leiden (using manual preserve commands or
by letting Koala do the co-allocation) and put the host
names in two variables:
$ hostlist0="node030.das3.cs.vu.nl node031.das3.cs.vu.nl"
$ hostlist1="node110.das3.liacs.nl node111.das3.liacs.nl"
-
Step 3: create a comma-separated list of all hosts to be used
in the order required.
In this example we want to run two processes on each host allocated:
$ hosts=`for i in $hostlist0 $hostlist1; do for j in 0 1; do echo -n $i,; done; done`
$ echo $hosts
node030.das3.cs.vu.nl,node030.das3.cs.vu.nl,node031.das3.cs.vu.nl,node031.das3.cs.vu.nl,node110.das3.liacs.nl,node110.das3.liacs.nl,node111.das3.liacs.nl,node111.das3.liacs.nl,
-
Step 4: transfer the binary to all sites participating in the grid run
$ rsync -e ssh -avz `pwd`/cpi_openmpi fs1.das3.liacs.nl:`pwd`/cpi_openmpi
-
Step 5: run the binary using the compute node's external eth0:0
interfaces across the internet.
NOTE: It is important that this is done by running OpenMPI's
mpirun startup tool on one of the compute nodes, not one of
the DAS-3 headnodes.
NOTE: This only works if you have set up your ssh keys
such that password-less ssh runs are possible between the DAS-3 sites.
$ set $hostlist0
$ starthost=$1
$ echo $starthost
node030.das3.cs.vu.nl
$ incl=eth0:0
$ excl=myri0,eth0
$ ssh $starthost "cd `pwd`; $MPI_HOME/bin/mpirun --prefix $MPI_HOME \
--mca oob tcp,self --mca btl sm,tcp,self \
--mca oob_tcp_include $incl --mca oob_tcp_exclude $excl \
--mca btl_tcp_if_include $incl --mca btl_tcp_if_exclude $excl \
--host $hosts -np 8 `pwd`/cpi_openmpi"
Process 0 on node030
Process 1 on node030
Process 2 on node031
Process 3 on node031
Process 4 on node110
Process 5 on node110
Process 6 on node111
Process 7 on node111
pi is approximately 3.1416009869231241, error is 0.0000083333333309
wall clock time = 0.051411
-
Step 6: Alternatively, run the binary using the compute node's internal myri0
interfaces across DAS-3's dedicated 10G WAN links (not on DAS-3/Delft):
$ incl=myri0
$ excl=eth0,eth0:0
$ ssh $starthost "cd `pwd`; $MPI_HOME/bin/mpirun --prefix $MPI_HOME \
--mca oob tcp,self --mca btl sm,tcp,self \
--mca oob_tcp_include $incl --mca oob_tcp_exclude $excl \
--mca btl_tcp_if_include $incl --mca btl_tcp_if_exclude $excl \
--host $hosts -np 8 `pwd`/cpi_openmpi"
Process 0 on node030
Process 1 on node030
Process 2 on node031
Process 3 on node031
Process 4 on node110
Process 5 on node110
Process 6 on node111
Process 7 on node111
pi is approximately 3.1416009869231241, error is 0.0000083333333309
wall clock time = 0.024332
|