Vu logo

Parallel programming Practical 2012/2013

Vu logo

How to measure application performance

If you are ready to make performance measurements, you should keep the following points in mind:

  • Times should always be measured by using 'wall clock time', which covers the absolute total time, including idle time. Simply store the clock time in a temporary variable at the start of the measured application kernel/region/section, and determine the difference between the clock time at the end of the measured region and the 'start time' variable.

  • Do not include application (de-)initialisation in your measurements. You may measure that as well, of course, but do not add them to the application's kernel timings. Data distribution and data gathering (as being part of the application's initialisation and de-initialisation) are not considered to be part of the application's kernel. Only when they occur multiple times in-between calculations, inside the kernel, you have to include them in your timing measurements.

  • Do not print (debugging-)text on the screen when measuring because it can have a big impact on the execution time.

  • Try to minimize data copying and data transferring in your application kernel. Since data copying is a relatively slow process, and storing data multiple times can have a negative impact on caching performance, the application's overall performance will decrease.

  • Remember to compile and run your application with all possible optimisations on.

  • When running your application several times, you will notice that it will not execute exactly as fast in all cases. Therefore, to make accurate measurements, run your application multiple times (at least 3 times, even more times in case you intend to leave out extreme values) and determine the mean timings.

  • If you decide to take measurements for a parallel program on all CPUs individually, use the one CPU that takes the longest time to complete. For example, when you find that for a dual-processor application, CPU1 takes 3.21 seconds to finish its operations, while CPU2 had already finished in 2.89 seconds, you must take the timing measurements for the first processor since that one specifies the absolute application execution time.

  • When you are computing speedups, make sure you are comparing the multi-processor application agains the provided sequential program. Also, remember that you must always use prun to run your programs. Even the application variants running on just 1 CPU should be started using prun -v -1 ./a.out 1 or something similar.

  • Use various problem sizes. Measure the performance for small, medium, and large problem sizes, and comment on the performance results (we recommend you to also consider different border-line cases, and make sure your application executes correctly). When choosing the maximum problem size you might want to keep in mind the parameters of the DAS nodes, for example the amount of memory or cache; to find these out, you can use commands like :

    	cat /proc/cpuinfo
    	vmstat
    	free
    

  • When reporting your performance results, do explain the experimental set-up: the number of nodes and/or cores you have used, the problem sizes, and the what has been measured exactly. Do not forget to report both the numbers for the execution times and the speed-up graphs; include all other relevant measurements you have made in additional tables/graphs. However, do not omit to explain the relevance and the results for each of the included graphs and/or tables.

  • Look for more hints in the FAQ page.

Clock

What's new?

October 31, 2012:
The new assignments are available on the blackboard.

October 31, 2012:
The site for PPP has moved on the blackboard.

Valid CSS!

Valid HTML 4.01 Strict