Panda assumes that one node in each cluster is dedicated to service wide-area
forwarding tasks: the gateway. The gateway is part of your parallel
program, so you must reserve one extra node in each cluster that cannot
be used for application work.
This way, the gateway is connected by Myrinet to the local processors.
It services wide-area communication over TCP or over Myrinet with
The Wide Area architecture is largely transparent
for the application. Programmers may query the wide-area layout
by calling the Panda functions pan_cluster_of(cpu) and
pan_cluster_nr(), see the Panda
docs (soon to be included).
The nodes are (re)numbered so that the gateway processors receive processor
numbers above the application maximum.
As a rule, libraries do not start the application on the gateway processes.
However, if you program directly on top of Panda, see
the section "Transparency issue for Panda", below.
It is a good idea to first run your Wide-Area program on the
DAS simulator. It provides switches to simulate wide-aray delays:
latency and maximum bandwidth can be separately tuned.
These are run-time switches that may be provided to the application:
each wide-area message is delayed for t seconds
a wide-area message of size B bytes is additionally
delayed for B / g seconds
To create a binary with Wide-Area simulator support, provide a flag
-pkt-cluster to your compile script (panc, mpicc, ...).
If there are n simulated clusters of equal size,
specify -pan-cld n to the applications.
If clusters must be of different size, contact the
maintainer of Panda for information. Remember that you have to
reserve n additional hosts for gateways.
Real Wide-Area runs
There is no intelligent scheduler that reserves and runs parallel
applications at different clusters. The user must simultaneously execute a
run on each cluster by hand.
To enable cluster support in Panda, provide the flag
-tcp-cluster to your compile script (panc, mpicc, ...).
There is no longer any need to compile the gateway binaries with different
flags. The normal application executable will run the gateway modules
if it deduces that it is a gateway (also see Section
Transparency issue for Panda programs, below).
If there are n clusters of equal size, specify
-pan-clst-d my_cluster n total_hosts to the applications.
If clusters must be of different size, contact
the maintainer of Panda for information.
Remember that you have to reserve n additional hosts for gateways.
The application uses a system server process that runs on the file server
to synchronise. By default this is the local file server. For the Wide Area,
this must be overriden. Specify -das-sync-server das0fs.cs.vu.nl
to your application. To synchronize,
all processes must use an identical key. This key is corrupted whenever your
Wide-Area program does not terminate regularly: especially, this means
that it is corrupted when startup fails on one or more clusters. Provide
a key to your applications with -pan-cluster-key key.
A good key is for instance "your-name:your-application:counter".
Flow control over the shared TCP link must be handled specially:
basically, the link between each node and its cluster's gateway must
be as fat as the total link capacity towards all off-cluster hosts.
Usually, this means that the local link capacity must be reduced.
Specify to your application:
-pan-sys-credits c -pan-gateway-credits g, where
c is chosen by you so that the message receive buffer
space at any node does not exceed pinned memory (8000 credits currently
is the limit), and g = (n - 1) * c * nodes-per-cluster.
Example MPI program
Compile two binaries with
mpicc -tcp-cluster mpi_program.c
mpicc -tcp-cluster -ot -o a.out.ot mpi_program.c
Run on each cluster i of 4 clusters, each with 6+1 hosts:
If you program directly on top of Panda (without using a standard
library/language like Orca, MPI, Manta, PVM) transparancy is not complete: an
application is also started on the gateway machines. On the gateways, your
application must terminate immediately in the following fashion. The
application must initialize Panda in the usual way (by calling
pan_init(), pan_mp_init() etc, and pan_start());
then, if your processor number pan_my_pid() is not less than
the number of application processors pan_nr_processes(), this
is a gateway. You must immediately terminate the panda modules by calling
pan_mp_end() etc, then pan_end(). After that, no application
code may be run on the gateway machines.
Back to the DAS home page
This page is maintained by
Rutger Hofman at the VU Amsterdam.
Last modified: Thu Sep 23 15:11:23 CEST 1999