StarPU Tutorial and Hands-on SessionFriday, May 13, 2011: 11:30 - 16:30
About StarPUStarPU is a project (developed by the INRIA Bordeaux - Sud-Ouest) which aims at providing portable optimized performance on clusters of heterogeneous multicore+accelerator machines to task-based applications. The goal is to relieve the programmer from the technical aspects of data management and task scheduling, while applying theoretical task scheduling algorithms on actual application execution to improve performance. It also provides performance feedback through task profiling and trace analysis.
This approach has been used successfully, for instance, for integrating in a few weeks the PLASMA (CPUs) and MAGMA (GPUs) cholesky, QR and LU factorizations into a CPU+GPU implementation whose efficiency is very close to peak performance.
For more information on the StarPU project, see the StarPU project website.
Objective of the TutorialThis tutorial will introduce to the usage of StarPU and performance analysis. The data and task model will first be introduced, as well as the corresponding programming interface, through a simple example. We will then introduce a few scheduling strategies which optimize execution by taking heterogeneity into account as well as data transfer penalty. Performance analysis tools will be demonstrated to give insights where performance may be lacking and how well application kernels are running. We will eventually introduce how MPI communication and StarPU execution can be tightly coupled to achieve optimized communication / computation overlaps.
Hands-on SessionDuring the hands-on session, the tutorial participants will install StarPU, and implement and run examples on the DAS-4 system. They will have the opportunity to experiment with CPUs + 1GPU execution and give a try at CPUs + 2 GPUs execution. They will also implement and run a simple MPI/CPU/GPU example.
The software required for the hands-on session is available here .