|
Parallel programming Practical 2011/2012 |
|
|||
|
|
The MPI assignmentIntroductionThis assignment for the parallel programming practical requires you to implement a parallel SOR (Successive Overrelaxation), using C and the MPI (Message Passing Interface) communication library. The application has to be ran and benchmarked on the DAS3. Send any questions you have for this assignment to Ana Lucia Varbanescu . The Message-Passing Interface or MPI is widely used. It is not a new programming language; rather it is a library of subprograms that can be called from C and Fortran programs. It was developed by an open, international forum consisting of representatives from industry, academia, and government laboratories. It has rapidly received widespread acceptance because it has been carefully designed to permit maximum performance on a wide variety of systems, and it is based on message passing, one of the most powerful and widely used paradigms for programming parallel systems. The introduction of MPI makes it possible for developers of parallel software to write libraries of parallel programs that are both portable and efficient. Use of these libraries will hide many of the details of parallel programming, and, as a consequence, make parallel computing much more accessible to students and professionals in all branches of science and engineering. Successive Overrelaxation (SOR)Successive Overrelaxation is an iterative algorithm for solving Laplace equations on a grid. The sequential algorithm works as follows. For all non-boundary points of the grid, SOR first calculates the average value of the four neighbors of the point: ![]() Then the new value of the point is determined using the following correction:
w is known as the relaxation parameter and a suitable value can be calculated in the following way:
The entire process terminates if during the last iteration no point in the grid has changed more than a certain quantity. Parallel SORThe parallel implementation of SOR is based on the Red/Black SOR algorithm. The grid is treated as a checkerboard and each iteration is split into two phases, red and black. During the red phase only the red points of the grid are updated. Red points only have black neighbors and no black points are changed during the red phase. During the black phase, the black points are updated in a similar way. Using the Red/Black SOR algorithm the grid can be partitioned among the available processors, each processor receiving a number of adjacent rows. All processors can now update different points of the same color in parallel. Before a processor starts the update of a certain color, it exchanges the border points of the opposite color with its neighbor. After each iteration, the processors must decide if the calculation must continue. Each processor determines if it wants to continue and sends its vote to the first processor. The calculation is terminated only if all the processors agree to stop. This means a processor may continue the calculation after the termination condition is reached locally. RequirementsImplement a parallel Successive Overrelaxation Algorithm (SOR) using MPI. Unlike the algorithm discussed in the class, this algorithm should not use a fixed number of iterations but terminate when none of the grid elements have changed more than a certain (constant) value (so the program should compute the maximum change over all processors). Data distribution at the start, and gathering at the end of the process must also be implemented in your program. However, you should not take the time this takes into account when performing speedup measurements. The version of SOR must partition the NxN grid as discussed in the class (i.e. each processor gets approximately N/P consecutive rows), should run on any number of machines, and accept any given problem size. Furthermore, you must implement a solution that overlaps the communication and computation (using boundary optimization). Measure the speedups of the SOR version on DAS3, using up to 16 nodes. A sequential version (for the 'row'-variant) and a Makefile are available in /home/ppp/pub/mpi on the DAS3 for your convenience. This sequential program also contains the 'stencil' operation that should repeatedly be applied to each grid point and it illustrates when the program should terminate (see the 'stopdiff' variable). Consider this version as the reference sequential implementation. Make sure you use the right MPI operations for the best performance. For example, use MPI's collective communication operations whenever that is useful. (e.g., to do a broadcast, use MPI's broadcast primitive, and not N serial send operations). Similarly, pay attention when choosing between synchronous and asynchronous communication. Finally, use the timers provided by MPI to measure the performance of sections of your code. Write a short report, in English, describing your design, implementation, and performance testing for your paralell SOR solution. Describe (briefly) the way the algorithm works, focusing on the MPI implementation details and the optimizations you have applied (if any). Before presenting the performance results, mention the composition of the experimental set-up - number of machines/nodes, input data-sets, application versions, etc. - and the way you measured the execution time - for which operations/regions. Include the performance measurements of your application (including a performance breakdown, if needed). Report execution times as well as speedups, and include speedup graphs as needed. Furthermore, compare the achieved performance and speed-up with the expected ones, and explain the eventual unexpected results. Compiling and running your applicationsTo compile your MPI programs you should use the MPI compiler mpicc, you should also construct a simple Makefile for your needs. Programs are to be written in the C programming language. If you are unfamiliar with C and Makefiles, you may want to read the guide C for Java Programmers by Jason Maassen. If you want to run MPI programs on the DAS, you have to use a special script. The commandline for running SOR looks like this: prun -v -1 -sge-script /usr/local/sitedep/reserve.sge/sge-script ./sor-par 4 SubmittingYou have to submit both the code (preferably containing useful comments that illustrate how the application works) and the report.Important Because we use automatic test scripts to test and benchmark your submissions, you must strictly follow the instructions below.
Documentation
GradingA correct implementation, compliant with all the requirements above (both for code and documentation) is graded with 8. Up to 2 bonus points (to grade 10) can be given for extra work on implementation, optimizations, testing and benchmarking. Examples are:
Important You get bonus points if you find interesting and/or creative ways to improve the parallel application (its implementation, performance, or analysis). However, optimizing the (reference) sequential algorithm, does not count as a bonus. Furthremore, make sure that the basic requiremets for the assignment are still met! Additional features are only graded for working solutions. TODO listIn order make sure you complete this part of the practical, we recommend the following recipe:
|
|