an albatross

Albatross

Wide Area Cluster Computing

Contents:


Overview

The goal of the Albatross project is to better understand application behavior on wide-area networks. A recent technology trend is Cluster Computing, where high-speed clusters of workstations are themselves connected over lower-speed links. Cluster Computing raises a host of research issues such as fault tolerance, performance, and programmability.

The focus of Albatross is on programmability and performance. Cluster Computing is approached from the applications side. It is obvious that parallel applications that communicate heavily need a high-speed link to function properly. On the other side of the scale, parallel applications that hardly communicate at all will also work well over a slow link. Cluster Computing has both types of interconnect. In Albatross we try to find out which applications work, which do not, and if so, what can be done to make them work. Our current work is on wide-area programming with MPI and Java.

Our main experimentation platform is DAS. (DAS is a cooperation between 4 Dutch universities. It's a wide-area supercomputer consisting of 200 200 MHz Pentium Pro's, divided over 4 clusters, 128-24-24-24, at the participating universities.) The goal of Albatross is to ease the writing of applications that execute efficiently on wide-area clusters of workstations, for a wide range of applications.

Albatross builds on experience with a number of parallel and distributed languages, run-time systems, and applications; in particular, Orca and Panda from Vrije Universiteit, and Cilk and CRL from MIT. See the respective pages for on-line papers with details.


People

Faculty
Staff
PhD Students
Master Students
Past

Publications

BibTeX entries for the Albatross publications

Talks


Software

a magpie

MagPIe 2.0

A magpie is a black-and-white bird that flies over wide areas to collect things. MagPIe is a library of MPI collective communication operations that are optimized for wide-area systems. Version 2.0 is designed to be an add-on to any MPI implementation. MagPIe is built as a separate library and calls the underlying MPI via the profiling interface. Applications just have to be linked with MagPIe and with MPI; changes to application source are not necessary for using MagPIe. However, it is required that you implement two functions that tell MagPIe how many clusters your wide-area system has and which MPI process is located in which cluster.

Download the software here: MagPIe2.0.tar.gz (current version: 2.0.2)

Learn more about magpies and flawed research projects here. LogP parameters

The MPI LogP Benchmark

The MPI LogP benchmark assesses the performance of message sending and receiving for given MPI implementations. The performance is expressed in terms of the parameterized LogP model, for messages of various sizes. Our current implementation measures between a pair of processes, assuming the communication in both directions to be symmetrical. (Though, this assumption may be invalid between a pair of heterogeneous workstations or over a WAN link.) Besides measuring, the performance data can be saved to file and loaded for later reuse. An API for retrieving LogP parameters for any message size is provided for making parallel MPI applications adaptive to communication costs.
For details, please see here.

Download the software here: logp_mpi.tar.gz (current version: 1.4, as of June 20, 2005) New!


Status

So far we have been busy doing performance evaluations with applications on the DAS, with interesting results (see Publications). Now we're working on some prototype systems.

Java

In doing experiments with Java RMI and JavaParty, we found the programming model of RMI and JavaParty to be convenient, but we also found that performance of remote method invocation for programming parallel clusters of workstations is too slow by far. Two-way latency of Sun's RMI is on the order of 1200 microsecond on Myrinet. Our Panda library achieves 30 microsecond on the same hardware. Based on our experience with Orca, we have written a Java system, called Manta, featuring a full-fledged native compiler and RMI run time system, that does a null-RMI in 35 microsecond. A report is here. A preliminary release of Manta is scheduled towards the end of this millenium.

Click here for Manta.

MPI

The performance evaluations with algorithmic restructuring have lead us to come up with a library of optimized collective communication operations for MPI. The collective operations are optimized for a hierarchical interconnect/meta computer, such as our DAS. The optimized collective operations are much faster than the standard MPICH 1.1 algorithms. Our library is called MagPIe, like the bird that collects things. You can simply link it with your application and your favourite MPI implementation. You can download the software and papers about MagPIe right from this page.

The project is still young. Expect to see more on this page in the near future. We welcome your input. If you have questions or comments about Albatross, please send email to kielmann@cs.vu.nl.


Name

An Albatross is a cool, black-and-white, wide-area bird. The project is about wide-area programming. Many other projects in our research group have names of black-and-white animals (Orca, Panda, Das, Hawk, Magpie, Manta). Also, Albatrosses are featured in some ancient Dutch legends and myths. The magnificent birds, with a wing span of up to 2 meters (about 7 feet) for long inspired awe among sailors, who ascribed special powers to them, and killing an Albatross would bring bad luck. (In other stories albatrosses play a less glamorous though more humorous part).

The Hutchinson Encyclopedia of Science reports on another pioneering wide area project:
Gossamer Albatross: The first human powered aircraft to fly across the English channel, in June 1979. Designed by Paul MacCready and piloted and pedaled by Bryan Allen. The channel crossing took 2 hours 49 minutes.


Related Work


Acknowledgements

We gratefully acknowledge the following institutions for making our research possible. This project is supported in part by the Dutch Organization for Scientific Research NWO, through a PIONIER and a SION grant. The Vrije Universiteit supports this research through a USF grant. The ASCI research school for computing and imaging and the Vrije Universiteit support the DAS, a central part of our research.

Thanks to the rest of the Orca group, Raoul Bhoedjang, John Romein, Tim Rühl, and Kees Verstoep, for help and suggestions.


Back to

Computer Systems Group
Dept. of Mathematics and Computer Science
Vrije Universiteit, Amsterdam, the Netherlands


This page is maintained by Thilo Kielmann. If you have any remarks, please send them to kielmann@cs.vu.nl.


Last updated: December 16, 2002, by Thilo Kielmann / URL: http://www.cs.vu.nl/albatross/