[image] DAS: Distributed ASCI Supercomputing
Cluster at University of Amsterdam

Research projects with DAS at University of Amsterdam


DAS GRAPE Tree

One of the challenges in the area of simulation, is the simulation of N-body systems with realistic numbers of particles and long range interactions. Direct computation on single processor computers is prohibitively expensive due to the O(N^2) behaviour of the number of interactions. Improvements can be realised in (at least) three ways: by using more efficient algorithms, such as the tree-type algorithms, by using parallel computers, such as the DAS and by using very fast special purpose processors, such as the GRAPE, and ultimately, by combining the three methods.

Tree-type algorithms can evaluate the interactions in O(NlogN) time. They have been effectively parallellised by (among others) Salmon and Warren, but suffer from an inherent lack of accuracy unless special measures are taken that will result in an increased number of two-body interaction evaluations. Such two-body interactions can be very efficiently evaluated using GRAPE hardware. GRAPES are actually so good at doing these types of evaluations, that, certainly when combined with tree type algorithms, the host computer tends to become the bottleneck in the simulation.

The aim of our project is to realise an environment that on the one hand can be used to perform large scale N-body simulations, and on the other hand to experiment with the optimum configurations from the viewpoints of accuracy, performance and balanced system ultilisation.

Collaboration with:


The Polder Initiative

Fundamental to the acceptance of advanced computing systems like the the DAS Supercomputer, is the efficient and effective access to computing resources. The importance of the access and ease of use of large, distributed and heterogeneous computing resources motivated the research in methods and techniques to provide an integrated environment for computing in all its diversity over wide-area distributed heterogeneous platforms. The application base of these advanced computing system consists mostly of resource intensive applications, i.e., the computational requirements of the application exceed the capacity of a single machine. The requirements posed by these programs needs to be accommodated in the same easy to use, transparent environment.

The objective in the Polder Initiative is to develop new ideas and methods to improve the effective usage of computing resources. The methods and the resulting tools and system components, both hardware and software, join the various computing resources into a single integrated computing platform, resulting in a so-called metacomputing environment. The computing resources can comprise workstations, MPPs and homebrew supercomputers interconnected using various networks, high speed or regular LANs (e.g. Myrinet or Ethernet) and wide area networks (e.g. ATM). Besides processing and communication resources, other resources like storage devices, which requires abstraction of the whereabout of data and management for efficient use, are also included in the metacomputer world view.

In the pursued of the metacomputing goal we have identified a number of challenging open issues. The topics we broach in metacomputing range from techniques needed to build a metacomputer to various levels of resource control. A metacomputer job execution environment is being built to provide a framework to perform experiments in. On the programming level we look at the impact of paradigms on resource management and how more powerful program structures can improve resource utilization. On the scheduling level we investigate how jobs should be mapped on distributed resources. Once jobs run we look how to schedule the program tasks, providing means to load balance over the dynamic behaviour of applications and resources. Finally on the run-time system level we look at how assigned resources can be deployed in the most efficient manner.

Job Resource Management

Computational and data intensive applications require the careful management of resources in order to efficiently execute them within the constraints set to the system. For applications to execute efficiently in a portable and transparent manner on current and future diverse platforms, general techniques for the effective management of resources have to be incorporated in the runtime environment. Since current computing environments are most often heterogeneous in nature, the different characteristics of the resources have to be balanced against each other in order to determine the most suitable available resource. In addition to this optimization problem, multiple jobs compete at the same time for the available computing platforms. This means that a runtime environment has to solve a many to many scheduling problem.

We currently recognize two levels of resource management. In the first level, the management of resources involves the distribution of arriving jobs over a possible wide-area network of computing resources, widely varying in characteristics. The first level provides resource management by finding suitable and available resources by load sharing.

In the second level of resource management, more efficient mapping of the application to hardware platform is achieved by capturing the dynamics in the application and resource behaviour. The parallel tasks of an application are migrated from one machine to another as better resources become available.

All these levels of resource management revolve around the open issue of scheduling resource accesses. The research focus on hierarchical methods and techniques that result in efficient, stable, and scalable algorithms for resource management.

Runtime Environments

To optimize resource utilization of distributed systems under environmental changes, such as varying load per processor or varying number of available processors, it can be necessary to migrate running tasks between processors. We have introduced a scheduling and task migration mechanism into two widely used message passing layers, namely PVM and MPI. The migration algorithm guarantees transparent migration of processes between different workstations without loss of messages. The resulting DynamicPVM/DynamicMPI system supports load balancing for tasks running in parallel on loosely coupled parallel systems.

Our primary objective is to study models describing adaptive systems. To validate these models, experiments with actual implementations of such dynamic systems are required. DynamicPVM and DynamicMPI are pilot implementations of such an experimental adaptive system.

Additional to the developments in DynamicPVM/DynamicMPI, the MOSIX operating system is considered to provide a platform for process checkpoint and migration. In the MOSIX operating system the checkpoint and migration facilities are implemented in the kernel, which has the advantage that these facilities are transparantly available all applications and not exclusively to PVM or MPI. A drawback however, is that the one has replace the "standard" operating system with MOSIX.

Parallel I/O

One of the most important issues in parallel computing is reducing communication bottlenecks. Parallel tasks need to communicate shared data and require external input and output streams. Systems where data is communicated not on the bases of the location of tasks but on a logical ordering in the data are parallel input/output systems. Examples of these systems are parallel filesystems and virtual shared memory.

The use of parallel I/O is in providing efficient file access to (parallel) programs, for both storage and inter task communication and communication to other programs. This furthermore can facilitate multi-site application, visualization, out of core computation, meta-computing and so on.

In parallel I/O system the main concern is to reduce latencies and increase bandwidth. Data is stored on multiple disks to increase bandwidth but this has negative influences on the latency. Furthermore causal dependencies have to be guaranteed when one read/write request is divided over multiple I/O servers.

Techniques to improve the performance include pre-fetching of data, replication and loosening the causality requirements. In addition the distribution and locality of the data over the I/O servers in of crucial importance. All the parameters in an I/O system (like data placement and number of I/O servers) have cross influences on each other creating a complex system of behaviour.

This research is involved in investigating the influences of the parameters and finding the preconditions for a successful implementation and gain experience of a parallel I/O system in a metacomputing environment.

Parable: A Parallel Discrete Event Simulation Environment

A distinct application area in scientific computing is Discrete Event Simulation (DES). In recent years the observation that components in many physical systems behave and interact in an asynchronous manner has reinforced the importance of discrete event simulation. The protocol formulated by the DES scheme is able to capture the asynchronous characteristics that are essential to the physical model. Discrete event simulations are known to be computationally intensive, where the control of the computational process is responsible for a significant amount of the complexity and execution time.

The distributed execution of the discrete event simulations introduces interesting problems with respect to causality: cause and effect. Different approaches to guarantee causality in a distributed simulation have been proposed. One of the most promising methods is Time Warp, which guarantees causality by state rollback techniques.


Research papers of the University of Amsterdam relating the DAS

epvmmpi97.ps.gz
B.J. Overeinder and P.M.A. Sloot: "Breaking the Curse of Dynamics by Task Migration: Pilot Experiments in the Polder Metacomputer," Proceedings of the European PVM/MPI'98 Workshop, Krakow, Poland, Nov. 1998.
parco97.ps.gz
A.W. van Halderen, B.J. Overeinder, P.M.A. Sloot, R. van Dantzig, D.H.J. Epema, and M. Livny: "Hierarchical Resource Management in the Polder Metacomputing Initiative," submitted for publication in Parallel Computing.



Pages at ASCI,
and the participants:
Vrije Universiteit,
Universiteit van Amsterdam,
Delft University of Technology,
University of Leiden,
University of Utrecht.
UvA/PSCS ASCI