Albatross
Wide Area Cluster Computing
Contents:
The goal of the Albatross project is to better understand application
behavior on wide-area networks. A recent technology trend is Cluster
Computing, where high-speed clusters of workstations are themselves
connected over lower-speed links.
Cluster Computing raises a host of research issues such as fault
tolerance, performance, and programmability.
The focus of Albatross is on programmability and performance. Cluster
Computing is approached from the applications side. It is
obvious that parallel applications that communicate heavily need a
high-speed link to function properly. On the other side of the scale,
parallel applications that hardly communicate at all will also work well
over a slow link. Cluster Computing has both types of
interconnect. In Albatross we try to find out which applications
work, which do not, and if so, what can be done to make them work.
Our current work is on wide-area programming with MPI
and Java.
Our main experimentation platform is DAS. (DAS is a
cooperation between 4 Dutch universities. It's a wide-area supercomputer
consisting of 200 200 MHz Pentium Pro's, divided over 4 clusters, 128-24-24-24, at the
participating universities.)
The goal of Albatross is to ease the writing of
applications that execute efficiently on wide-area clusters of
workstations, for a wide range of applications.
Albatross builds on experience with a number of parallel and
distributed languages, run-time systems, and applications; in
particular, Orca and Panda from
Vrije
Universiteit, and Cilk and CRL from MIT. See the
respective pages for on-line papers with details.
Faculty
Staff
PhD Students
Master Students
Past
- Mirjam Bakker
- Ronald Blankendaal
- Monique Dewanchand
- Peter Dozy
- Koen Langendoen (koen@pds.twi.tudelft.nl)
- Grégory Mounié
- Arnold Nelisse
- Aske Plaat (aske@xs4all.nl)
- Rody Schoonderwoerd
BibTeX entries for the
Albatross publications
-
Rody Schoonderwoerd:
Network Performance Measurement Tools - a Comprehensive Comparison,
Master's thesis, Vrije Universiteit Amsterdam, November 2002.
-
Mathijs den Burger:
A Monitoring Tool for Grid Networks,
Master's thesis, Vrije Universiteit Amsterdam, February 2002.
- Thilo Kielmann, Henri E. Bal, Sergei Gorlatch, Kees Verstoep,
and Rutger F.H. Hofman:
Network Performance-aware Collective Communication for
Clustered Wide Area Systems.
Accepted for publication in Parallel Computing, 2001.
- Jason Maassen, Thilo Kielmann, and Henri E. Bal:
Parallel Application Experience with
Replicated Method Invocation.
Concurrency & Computation: Practice & Experience,
Vol. 13, No. 8-9, pp. 681-712, 2001.
- Rob V. van Nieuwpoort, Thilo Kielmann, and Henri E. Bal:
Efficient Load Balancing for Wide-area Divide-and-Conquer
Applications.
Proc. PPoPP '01: ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, pp. 34-43, Snowbird, Utah,June 18-20, 2001.
- Arnold Nelisse, Thilo Kielmann, Henri E. Bal, and Jason Maassen:
Object-based Collective Communication in Java.
Joint ACM JavaGrande-ISCOPE 2001 Conference, pp. 11-20,
Stanford University, June 2-4, 2001.
- G. Allen et al.:
Early Experiences with the EGrid Testbed.
Proc. First IEEE/ACM International
Symposium on Cluster Computing and the Grid (CCGrid 2001), pp. 130-137,
May 15-18, 2001, Brisbane, Australia.
-
Aske Plaat, Henri E. Bal, Rutger F.H. Hofman, and Thilo Kielmann:
Sensitivity of Parallel Applications to Large Differences in
Bandwidth and Latency in Two-Layer Interconnects,
Future Generation Computer Systems,
Vol. 17, No. 6, pp. 769-782, 2001.
(This is an extended version of the HPCA-5 paper.)
-
Henri Bal et al:
The Distributed ASCI Supercomputer Project,
Operating Systems Review, Vol. 34, No. 4, pp. 76-96, October 2000.
(ACM SIGOPS).
-
Rob van Nieuwpoort, Thilo Kielmann, and Henri E. Bal:
Satin: Efficient Parallel Divide-and-Conquer in Java,
Proc. Euro-Par 2000, pp. 690-699, Munich, Germany,
August 29 - September 1, 2000.
Lecture Notes in Computer Science, Vol. 1900.
An abridged version has been published in the proceedings of ASCI 2000,
the sixth annual conference of the
Advanced School for Computing and Imaging, pp. 177-184,
Lommel, Belgium, June 14-16, 2000.
-
Rob van Nieuwpoort, Jason Maassen, Henri E. Bal, Thilo Kielmann, and
Ronald Veldema:
Wide-Area Parallel Programming using the Remote Method
Invocation Model.,
Concurrency: Practice and Experience,
Vol. 12, No. 8, pp. 643-666, 2000.
-
Jason Maassen, Thilo Kielmann, and Henri E. Bal:
Efficient Replicated Method Invocation in Java,
Proc. ACM 2000 Java Grande Conference, pp. 88-96,
San Francisco, CA, June 3-4, 2000.
An abridged version has been published in the proceedings of ASCI 2000,
the sixth annual conference of the
Advanced School for Computing and Imaging, pp. 169-176,
Lommel, Belgium, June 14-16, 2000.
-
Thilo Kielmann, Henri E. Bal, and Kees Verstoep:
Fast Measurement of LogP Parameters for Message Passing Platforms,
4th Workshop on Runtime Systems for Parallel Programming (RTSPP),
pp. 1176-1183,
held in conjunction with IPDPS 2000, Cancun, Mexico, May 1-5, 2000.
Lecture Notes in Computer Science, Vol. 1800.
-
Thilo Kielmann, Henri E. Bal, and Sergei Gorlatch:
Bandwidth-efficient Collective Communication for Clustered
Wide Area Systems,
International Parallel and Distributed Processing Symposium (IPDPS 2000),
pp. 492-499, Cancun, Mexico, May 1-5, 2000.
-
Thilo Kielmann, Henri E. Bal, Jason Maassen, Rob van Nieuwpoort,
Ronald Veldema, Rutger Hofman, Ceriel Jacobs, and Kees Verstoep:
The Albatross Project:
Parallel Application Support for Computational Grids,
Proc. 1st European GRID Forum Workshop, pp. 341-348, Poznan, Poland,
April 12-13, 2000.
-
Henri E. Bal, Aske Plaat, Thilo Kielmann, Jason Maassen, Rob van Nieuwpoort,
and Ronald Veldema:
Parallel Computing on Wide-Area Clusters: the Albatross Project,
Proc. Extreme Linux Workshop, pp. 20-24,
Monterey, CA, June 8-10, 1999.
-
Rob van Nieuwpoort, Jason Maassen, Henri E. Bal, Thilo Kielmann, and
Ronald Veldema:
Wide-area parallel computing in Java,
Proc. ACM 1999 Java Grande Conference, pp. 8-14,
San Francisco, CA, June 12-14, 1999.
Also published in the proceedings of ASCI'99, the fifth annual conference
of the
Advanced School for Computing and Imaging, pp. 338-347,
Heijen, The Netherlands, June 15-17, 1999.
-
Jason Maassen, Rob van Nieuwpoort, Ronald Veldema, Henri E. Bal, and
Aske Plaat:
An Efficient Implementation of Java's Remote
Method Invocation,
Proc. Seventh ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming (PPoPP'99), pp. 173-182,
Atlanta, GA, May 4-6, 1999.
-
Thilo Kielmann, Rutger F.H. Hofman, Henri E. Bal, Aske Plaat, and
Raoul A.F. Bhoedjang:
MagPIe: MPI's Collective
Communication Operations for Clustered Wide Area
Systems,
Proc. Seventh ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming (PPoPP'99), pp. 131-140,
Atlanta, GA, May 4-6, 1999.
-
Thilo Kielmann, Rutger F.H. Hofman, Henri E. Bal, Aske Plaat, and
Raoul A.F. Bhoedjang:
MPI's Reduction Operations in Clustered Wide Area Systems,
Proc. Message Passing Interface Developer's and User's Conference
(MPIDC'99), pp. 43-52, Atlanta, GA, March 10-12, 1999.
An
abridged version
appeared in the proceedings of ASCI'99, the fifth annual conference of the
Advanced School for Computing and Imaging, pp. 329-337,
Heijen, The Netherlands, June 15-17, 1999.
-
Aske Plaat, Henri E. Bal, and Rutger F.H. Hofman:
Sensitivity of Parallel Applications to Large Differences in
Bandwidth and Latency in Two-Layer Interconnects,
Proc. High Performance Computer Architecture (HPCA-5),
pp. 244-253, Orlando, FL, January 1999.
-
Ronald Veldema, Rob van Nieuwpoort, Jason Maassen, Henri E. Bal, and
Aske Plaat:
Efficient
Remote Method Invocation,
Technical Report IR-450,
Vrije Universiteit Amsterdam, September, 1998.
-
Ronald Veldema:
Jcc, a native Java compiler,
Master's
thesis Vrije Universiteit Amsterdam, August 1998.
-
Jason Maassen and Rob van Nieuwpoort:
Fast Parallel Java,
Master's
thesis Vrije Universiteit Amsterdam, August 1998.
-
Thilo Kielmann, Aske Plaat, and Henri E. Bal:
Software Components Enable Wide-Area Supercomputing:
Takeoff of the Albatross,
Position statement, 5th CaberNet Radicals Workshop,
Valadares, NR. Porto, Portugal, July 1998.
-
Henri E. Bal, Aske Plaat, Mirjam G. Bakker, Peter Dozy, and Rutger
F.H. Hofman:
Optimizing Parallel
Applications for Wide-Area Clusters,
Proc. 12th International
Parallel Processing Symposium
(IPPS'98),
Orlando, Florida, April 1998.
A slightly longer version appeared as
Technical Report IR-430,
Vrije Universiteit Amsterdam, September 1997.
A shorter
version is in the proceedings (pages 784-790).
It lacks most of the speedup graphs.
-
M.G. Bakker and P. Dozy:
Performance study of
parallel programs on a clustered Wide-Area Network,
Master's
thesis Vrije Universiteit Amsterdam, August 1997.
-
Collective Communication Support for Grid Computing,
Thilo Kielmann,
4th Metacomputing Workshop at HLRS, Stuttgart, Germany, May 04, 2001.
-
Cluster Computing,
Henri Bal,
NCF Meeting, Tilburg, The Netherlands, April 10, 2001.
-
Grid Based Computing in The Netherlands,
Henri Bal,
Keynote lecture, First Global Grid Forum Conference (GGF1),
Amsterdam, The Netherlands, March 4, 2001.
-
Programming Support for Distributed Clustercomputing,
Henri Bal,
Keynote lecture, Cluster 2000, Chemnitz, Germany, November 30, 2000.
-
Parallel Programming Support for Computational Grids,
Thilo Kielmann,
Dept. of Mathematics and Computer Science,
University of Leipzig, Germany, November 9, 2000.
-
Albatross : Parallel Application Support for MPI and Java on
Computational Grids,
Thilo Kielmann,
3rd Metacomputing Workshop at HLRS, Stuttgart, Germany, June 06, 2000.
-
The Albatross Project:
Parallel Application Support for Computational Grids,
Thilo Kielmann,
1st European GRID Forum Workshop, Poznan, Poland, April 12, 2000.
-
Fast Communication in Java for Parallel Cluster Computing,
Henri Bal,
Keynote lecture, Workshop on Communication, Architecture, and Applications
for Network-based Parallel Computing (CANPC'00),
Toulouse, France, January 8, 2000.
-
Wide-Area Parallel Computing in Java,
Henri Bal,
Dept. of Computer Science, Free University, Berlin, Germany,
October 22, 1999.
Dept. of Computer Science, University of Coimbra, Protugal,
October 15, 1999.
-
Albatross, MagPIe, and Manta: Parallel Computing on the
Distributed ASCI Supercomputer (DAS),
Thilo Kielmann,
GMD-NEC Workshop
PC Clusters for Scientific and Industrial Applications,
GMD/Schloß Birlinghoven, Germany, September 14, 1999.
-
Wide-area parallel computing in Java,
Henri Bal,
ACM 1999 Java Grande Conference, San Francisco, CA, June 1999.
Rob van Nieuwpoort,
Fifth annual conference of the
Advanced School for Computing and Imaging (ASCI'99),
Heijen, The Netherlands, June 1999.
-
Parallel Computing on Wide-Area Clusters: the Albatross Project,
Henri Bal,
Extreme Linux Workshop, Monterey, CA, June 1999.
-
MagPIe: MPI's Collective Communication Operations
for Clustered Wide Area Systems,
Thilo Kielmann,
DAS Technical Meeting, Delft University of Technology,
Delft, The Netherlands, Nov 30, 1999.
ACM SIGPLAN Symposium on Principles and Proctice of Parallel Programming
(PPoPP'99), May 05, 1999.
Henri Bal,
HPCN Europe '99,Distributed Computing and Metacomputing Workshop,
Amsterdam, The Netherlands, April 12, 1999.
-
The Albatross Project: Wide Area Parallel Programming,
Thilo Kielmann,
2nd Metacomputing Workshop at HLRS, Stuttgart, Germany, April 26, 1999.
-
MPI's Reduction Operations in Clustered Wide Area Systems,
Thilo Kielmann,
Message Passing Interface Developer's and User's Conference (MPIDC'99),
Atlanta, GA, March 10, 1999.
-
MagPIe: MPI's Collective Communication Operations
for Clustered Wide Area Systems,
Thilo Kielmann,
Dept. of Systems and Computer Engineering, Carleton University,
Ottawa, Canada, March 23, 1999,
Dept. of Computer Science, University of Tennessee, Knoxville,
March 08, 1999,
Dept. of Mathematics and Computer Science,
University of Münster, Germany, February 4, 1999.
(This is an extended version of the PPoPP talk with the same title.)
-
Sensitivity of Parallel Applications to Large Differences in Bandwidth
and Latency in Two-Layer Interconnects,
Aske Plaat,
HPCA-5, Orlando, Florida, January 12, 1999.
-
Can you run parallel FFT on a wide-area system (efficiently)?,
Thilo Kielmann,
Dept. of Systems and Computer Engineering, Carleton University, Ottawa,
Canada, July 22, 1998.
-
Life in Long Latency Land,
Aske Plaat,
DISH Symposium, Vrije Universiteit Amsterdam, June 4, 1998.
-
Optimizing Parallel Applications for Wide-Area Clusters,
Aske Plaat,
IPPS, Orlando, FL, April 2, 1998;
Supercomputing Technologies Group, MIT, Cambridge, MA, March 25, 1998;
DAS Workshop, Vrije Universiteit Amsterdam, March 2, 1998.
-
Parallel Applications on a Clustered WAN,
Aske Plaat,
NOW Group, UC Berkeley, CA, November 20, 1997.
A magpie is a black-and-white bird that flies over wide areas to
collect things.
MagPIe is a library of MPI collective communication operations that
are optimized for wide-area systems.
Version 2.0 is designed to be an add-on to any MPI implementation.
MagPIe is built as a separate library and calls the underlying MPI via the
profiling interface.
Applications just have to be linked with MagPIe and with MPI;
changes to application source are not necessary for using MagPIe.
However, it is required that you implement two functions that tell MagPIe
how many clusters your wide-area system has and which MPI process is
located in which cluster.
Download the software here:
MagPIe2.0.tar.gz (current version: 2.0.2)
Learn more about magpies and flawed research projects
here.
The MPI LogP benchmark assesses the performance of message sending and
receiving for given MPI implementations. The performance is expressed in
terms of the parameterized LogP model,
for messages of various sizes.
Our current implementation measures between a pair of processes, assuming
the communication in both directions to be symmetrical.
(Though, this assumption may
be invalid between a pair of heterogeneous workstations or over a WAN link.)
Besides measuring, the performance data can be saved to file and loaded for
later reuse. An API for retrieving LogP parameters for any message size
is provided for making parallel MPI applications adaptive to communication
costs.
For details, please
see here.
Download the software here:
logp_mpi.tar.gz (current version: 1.4, as of June 20, 2005)
So far we have been busy doing performance evaluations with
applications on the DAS, with interesting results (see
Publications). Now we're working on some prototype
systems.
Java
In doing experiments with Java RMI and JavaParty, we
found the programming model of RMI and JavaParty to be convenient, but
we also found that performance of remote method invocation for
programming parallel clusters of workstations is too slow by far.
Two-way latency of Sun's RMI is on the order of 1200 microsecond on
Myrinet. Our Panda library achieves 30 microsecond on the same
hardware. Based on our experience with Orca, we have
written a Java system, called Manta, featuring a full-fledged native
compiler and RMI run time system, that does a null-RMI in 35
microsecond. A report is here. A preliminary
release of Manta is scheduled towards the end of this millenium.
Click here for Manta.
MPI
The performance evaluations with algorithmic restructuring have lead
us to come up with a library of optimized collective communication
operations for MPI. The collective operations are optimized for a
hierarchical interconnect/meta computer, such as our DAS. The
optimized collective operations are much faster than the standard
MPICH 1.1 algorithms. Our library is called MagPIe, like the bird that
collects things. You can simply link it with your application and
your favourite MPI implementation.
You can download the software and papers about MagPIe
right from this page.
The project is still young. Expect to see more on
this page in the near future.
We welcome your input.
If you have questions or comments about Albatross, please send email
to kielmann@cs.vu.nl.
An Albatross is a cool, black-and-white, wide-area bird. The project
is about wide-area programming.
Many other projects in our research group have names of
black-and-white animals (Orca, Panda,
Das,
Hawk, Magpie,
Manta).
Also, Albatrosses are featured in some ancient Dutch
legends and myths. The magnificent birds, with a wing span of up to 2
meters (about 7 feet) for long inspired
awe among sailors, who ascribed special powers to them, and killing an
Albatross would bring bad luck.
(In other
stories albatrosses play a less glamorous though more humorous
part).
The Hutchinson Encyclopedia of Science reports on another pioneering
wide area project:
Gossamer Albatross:
The first human powered aircraft to fly across the English channel,
in June 1979. Designed by Paul MacCready and piloted and pedaled by
Bryan Allen. The channel crossing took 2 hours 49 minutes.
- Direct Ancestors
- Metacomputing
- Other
We gratefully acknowledge the following institutions for making our
research possible. This project is supported in part
by the Dutch Organization for Scientific
Research NWO,
through a PIONIER and a SION grant. The
Vrije Universiteit supports this research through a USF grant. The
ASCI research school
for computing and imaging and the Vrije Universiteit support the DAS, a central part of
our research.
Thanks to the rest of the Orca group, Raoul Bhoedjang, John Romein,
Tim Rühl, and Kees
Verstoep, for help and suggestions.
Back to
Computer Systems Group
Dept. of Mathematics and Computer Science
Vrije Universiteit,
Amsterdam, the Netherlands
This page is maintained by
Thilo Kielmann.
If you have any remarks, please send them to
kielmann@cs.vu.nl.
Last updated: December 16, 2002, by
Thilo Kielmann
/ URL: http://www.cs.vu.nl/albatross/