DAS-3 Home
Achievements
Announcements
Overview
Cluster Sites
Connectivity
Research
User Accounts
Usage Policy
Job Execution
History
Steering Group
About the name...
|
prun - Reserve compute nodes from cluster and run job
SYNOPSIS
prun [options] application ncpus [application args]
prun [options] -np ncpus application [application args]
DESCRIPTION
prun provides a convenient way to run applications on a cluster.
It reserves the requested number of cpus (or nodes)
and executes a parallel application on them. Host scheduling is exclusive,
i.e.,
prun does not allocate multiple jobs on one host.
prun builds on a reservation system that is partly based on goodwill:
compute node reservation is implemented, but generally not strictly enforced.
However, users sidestepping the reservation mechanism and accessing
compute nodes directly will incur the wrath of both
fellow users and the system administrators.
Scheduling
prun runs an application in parallel on the requested number of cpus.
The default maximum execution time is 15 minutes on DAS-2 and DAS-3,
which is also the maximum allowed reservation during daytime.
If no start time is specified explicitly, the parallel run is scheduled
as soon as the requested number of cpus is available.
If not enough cpus are available immediately, prun waits until
the reserved time, or until canceled reservations allow earlier execution.
In the latter case, the reservation schedule is compressed.
If a start time is specified explicitly, and the requested resources are
available, the reservation is scheduled, and prun sleeps until the
specified time.
The users are themselves responsible to honor the required execution
limits, or request an exception by email to the system administrators.
By default, prun rounds the requested number of cpus upwards
to a multiple of the number of cpus per node, i.e., 2 for DAS-2 and DAS-3.
This ensures that there are no other jobs on the reserved nodes
(from the same or another user) that might interfere.
Rsh peculiarities
prun by default uses rsh(1) for application invocation.
rsh only works if the caller node is trusted by the callee node.
Therefore, the caller node must be present in the user's
.rhost file, or, as is the case on DAS-2 and DAS-3,
this must be enabled on a system-wide scale.
Another limitation of rsh is its requirement at the
protocol level for TCP ports within a restricted range, which
can run out when large numbers of processes are to be started.
However, this restriction can be circumvented by letting prun
use ssh instead: see the -rsh option below.
Since rsh has some problems in dealing with standard input,
prun imposes two limitations.
The first is that only cpu 0 of the started parallel application
is allowed to read from standard input. The second is caused by the fact that
standard input is always opened, whether the application wants to read it or
not. Therefore, if prun must run in the background and no input is
necessary, standard input must be redirected to
/dev/null.
See also rsh(1).
Since prun uses rsh to start remote processes,
the process limits (like memory usage, execution time limit)
are derived from the user's default values. When the process
limits are to be changed, users must change them in their
.cshrc (csh, tcsh users)
or
.bashrc (bash users)
or
.kshrc (ksh users).
Single-shot property
prun generates a run-unique key for each parallel run.
This key can be used for synchronization by other software layers,
like the ones based on
Panda.
prun should not be used to invoke scripts that run
multiple parallel programs in sequence, since in that case
the run-unique key would be shared between consecutive runs.
This leads to start-up problems.
Therefore invoke prun for each parallel program run separately.
OPTIONS
- -c dir
-
Symbolic name of the directory where the parallel application is to be
executed (default: current directory).
prun writes temporary files into the current directory,
with instructions and environment information for the worker processes.
For this reason, worker processes must run from the current directory.
rsh(1) starts its remote execution from the user's home directory,
so prun must remotely change to the desired directory.
However, the current directory name on the local host may differ
from the (symbolic) name in remote hosts (since the file system
may have been differently mounted).
To overcome this, an option -c dir is supplied, in which the
(symbolic) name of the current directory is specified. To determine
the current directory,
prun first inspects the environment variable PWD,
which is set by tcsh(1) and bash(1).
For other shells, it may be necessary to specify the (symbolic)
name of the current directory with -c.
Since prun creates temporary files in this directory, the user must
have write permission in it.
- -core
-
allow application core dumps (default).
- -no-core
-
suppress application core dumps.
- -d time
-
poll every time seconds (default: 1).
- -delay time
-
add a delay of time seconds (default: depends on file size) between
spawns of remote processes. time is a floating point number.
- -export-env
-
export prun's process environment to forked application processes
(default).
- -no-export-env
-
do not export prun's process environment to forked application processes.
- -keep
-
do not return reservation after execution. Generally used in conjunction with
-reserve.
- -no-keep
-
return reservation after execution (default).
- -n
-
echo
rsh
and reserve commands, but do not execute.
- -np ncpus
-
The -np option expresses the (common) case of parallel runs that
do not expect Panda-style cpu rank arguments in a more natural way.
prun -np ncpu app args
is an alias for
prun -no-panda app ncpu args.
- -o outputfile
-
output from each of the parallel processes is diverted to a separate file,
named outputfile.0, outputfile.1, ....
This option does not work in combination with -sge-script.
- -panda
-
feed the application as the first two command line arguments its process rank
the total number of processes (default).
- -no-panda
-
do not add any process ranking arguments to the application command line.
- -sge-script script
-
Runs script on cpu 0. The script should start up the processes on
the other cpus, as is customary for SGE scripts. To ease the development
cycle, prun -sge-script sets a number of environment variables:
PRUN_CPUS contains the total number of cpus; PRUN_CPUS_PER_NODE
contains the number of cpus per node; PRUN_PROG contains the name of the
executable specified to prun; PRUN_PROGARGS contains the list of
application arguments specified to prun.
An example script is found in
/usr/local/sitedep/reserve.sge/sge_script;
it allows the user to run an MPICH/MX or MPICH/GE application
without having to bother about SGE or MPICH configuration.
As is usual with prun, but in contrast to SGE schedules, stdin,
stdout and stderr are redirected to the terminal, and the program is run
from the current directory or the directory indicated with the
-c option.
- -pg dir_prefix
-
each process changes directory to dir_prefixXX, where XX is
the instance number. E.g. this can be used to generate separate profile
dumps or core dumps.
- -ping
-
ping all hosts on which the program is to run before forking the processes.
If the ping fails, an indication that the host is down is printed (default).
- -no-ping
-
do not ping the hosts before forking.
- -q queue
-
- Enter the reservation into the cluster queue named queue.
The default is the system-default queue; typically this is the
queue containing all available nodes.
For prun running on SGE (as on DAS-2 and DAS-3), this option can be
used to enforce scheduling on a specific subset of nodes.
E.g., using the following:
-q "all.q@node001,all.q@node002".
- -reserve id
-
use previously obtained reservation id id.
By default this also sets -keep.
This option can be used to reserve nodes for a time spanning a
number of runs. A reservation id can be obtained by calling
preserve(1)
with the required time and nodes.
- -rsh remote-shell
-
use remote-shell (as an absolute path name) to spawn remote processes,
instead of rsh, e.g., -rsh /usr/bin/ssh.
This is typically used to replace prun's use of rsh by ssh which,
besides being more secure, also does not suffer from rsh limitations
related to the number of TCP ports that can be allocated on the
submitting host.
- -s time
-
start at time [[mm-]dd-]hh:mm (default: now).
- -t time
-
the maximum application walltime is set to time = [[hh:]mm:]ss
(default: 15 minutes).
- -tmk
-
start application in the Treadmarks manner. This means that only the
process on the first cpu is started, and this process forks the
other Treadmarks processes. Also, a file $HOME/.Tmkrc is created to distribute
the list of nodes. Since the name of this file is shared between all parallel
runs of a given user, it is impossible for any user to run more than one
parallel Treadmarks program at the same time.
- -v
-
report host allocation.
- -[124]
-
By default, prun reserves the requested amount of cpus,
and starts the same amount of processes per node,
i.e., 2 processes matching the 2 cpus per node on DAS-2 and DAS-3.
If -[124] is specified, however,
prun allocates and schedules the specified number
of nodes, and then runs the number of processes per node
specified by this option
(ignoring the number of cpus per node),
- var=value
-
add var=value to application environment.
- -?
-
print usage.
SEE ALSO
rsh(1),
preserve(1).
ENVIRONMENT
prun copies all its own environment variables to the environment
of the spawned processes. It adds some extra variables: PRUN_CPU_RANK
contains the rank of the current spawned process; PRUN_HOSTNAMES contains
a list of host names, one per spawned process. The -sge-script option
adds some more environment variables.
POSSIBLE PITFALLS
- ``Illegal option: 0 16''
-
The user has not specified -no-panda, so prun adds host numbers to
the command line.
Example:
$ prun -v /bin/echo 2 hello
Reserved 2 hosts for 900 seconds from Tue Mar 27 14:43:02 CEST 2007
: node001 node002
All hosts are alive
1 2 hello
0 2 hello
Another example:
$ prun -no-panda -v /bin/echo 2 hello
Reserved 2 hosts for 900 seconds from Tue Mar 27 14:43:25 CEST 2007
: node001 node002
All hosts are alive
hello
hello
The previous example can also be started using the -np option,
as follows (note that the process argument follows -np rather than the
application):
$ prun -np 2 -v /bin/echo hello
- ``Fatal error: cannot stat application a.out''
-
The user has specified an incomplete path for his executable.
- Prolongued silence
-
There is no room in the current schedule for the requested number of cpus
and compute time, so prun waits.
prun -v or preserve -llist
shows information on host allocation and presumed start time.
- ``Out of memory''
-
The user has not changed the process memory limit in his .cshrc
or .bashrc file. Maybe he did it in his .profile or
his .login file, but rsh(1) does not look there.
- No core dumps
-
The user has not changed the process coredump limit in his .cshrc
or .bashrc file. Maybe he did it in his .profile or
his .login file, but rsh(1) does not look there.
- ``watchit fatal error: Cannot open environment file .PRUN_ENVIRONMENT.procid.host''
-
One of two possibilities: either the user has no write permission in the
current directory, or he has specified an illegal -c directory option.
That way, prun is requested to run from an unexisting directory, a
directory without write permission, or a directory which has not been
remote-mounted: /tmp and /var/tmp are excellent examples of
the latter error.
- I cancelled my reservation, but the compute jobs live on!
-
Reservation and execution are two different things.
True, prun(1) obtains a reservation,
executes your jobs and cancels the reservation.
But these are three separate actions. To kill the jobs, interrupt your
prun process by sending it a ^C or a SIGINT (kill(1)).
Your prun process will propagate a SIGINT to your jobs so they
will die, and then cancel your reservation.
|
RELATED LINKS
|