Header Header Header Header Header

StarPU Tutorial and Hands-on Session

Friday, May 13, 2011: 11:30 - 16:30

About StarPU

StarPU is a project (developed by the INRIA Bordeaux - Sud-Ouest) which aims at providing portable optimized performance on clusters of heterogeneous multicore+accelerator machines to task-based applications. The goal is to relieve the programmer from the technical aspects of data management and task scheduling, while applying theoretical task scheduling algorithms on actual application execution to improve performance. It also provides performance feedback through task profiling and trace analysis.

This approach has been used successfully, for instance, for integrating in a few weeks the PLASMA (CPUs) and MAGMA (GPUs) cholesky, QR and LU factorizations into a CPU+GPU implementation whose efficiency is very close to peak performance.

For more information on the StarPU project, see the StarPU project website.



Objective of the Tutorial

This tutorial will introduce to the usage of StarPU and performance analysis. The data and task model will first be introduced, as well as the corresponding programming interface, through a simple example. We will then introduce a few scheduling strategies which optimize execution by taking heterogeneity into account as well as data transfer penalty. Performance analysis tools will be demonstrated to give insights where performance may be lacking and how well application kernels are running. We will eventually introduce how MPI communication and StarPU execution can be tightly coupled to achieve optimized communication / computation overlaps.



Tutorial Contents

11:30 - 12:30StarPU task-based programming model
12:30 - 13:30Programming model hands-on session
13:30 - 14:30Lunch
14:30 - 15:30StarPU optimizations
15:30 - 16:30Optimizations hands-on session

Slides



Hands-on Session

During the hands-on session, the tutorial participants will install StarPU, and implement and run examples on the DAS-4 system. They will have the opportunity to experiment with CPUs + 1GPU execution and give a try at CPUs + 2 GPUs execution. They will also implement and run a simple MPI/CPU/GPU example.

The software required for the hands-on session is available here .



Relevant Publications



Tutorial Presenter