Announcements

4 April 2018

The DAS-4 clusters have been updated to CentOS 7, and the Nvidia CUDA development kit has been updated to version 8.0 (version 9.1 is also available for recent GPUs). This update makes DAS-4 software-compatible with DAS-5.

6 Jul 2015

DAS-5 is now fully operational! To make room for DAS-5, DAS-4/UvA and DAS-4/ASTRON have been decomissioned, only their headnodes remain available.

25 April 2013

Slides of the DAS-4 workshop presentations are now available.

GPU programming

GPUs on DAS-4 (see Special Nodes) can be programmed using two paradigms: CUDA and OpenCL.

CUDA

CUDA is supported by Nvidia GPUs. The current CUDA 8.0 implementation can be added to your environment as follows:


$ module load cuda80/toolkit

Documentation for writing and building CUDA applications is then available from $CUDA_INSTALL_PATH/doc/CUDA_C_Programming_Guide.pdf. A SLURM job script to submit a CUDA application on a host with a GTX480 GPU could then look like this:


#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1
#SBATCH -C GTX480
#SBATCH --gres=gpu:1

. /etc/bashrc
. /etc/profile.d/modules.sh
module load cuda80/toolkit

./cuda-app opts

The option "SBATCH -C GTX480" specifies a node with a GTX480 GPU, while the option "SBATCH --gres=gpu:1" lets SLURM allocate the GPU for the job by setting environment parameter CUDA_VISIBLE_DEVICES to 0. Note that without the "--gres" option, SLURM by default sets CUDA_VISIBLE_DEVICES to value NoDevFiles, which causes the CUDA runtime system to ignore the GPU.

Environment modules containing settings for additional CUDA BLAS implementation (cuda80/blas), FFT implementation (cuda80/fft), and profiler (cuda80/profiler) are also available.

OpenCL

OpenCL is supported by Nvidia GPUs, AMD GPUs, Xeon Phis, and regular host CPUs. Three implementations are available: opencl-nvidia, opencl-amd, and opencl-intel. All OpenCL implementations employ a common libOpenCL.so dynamic library, so when switching between OpenCL implementations, be sure to use module unload to undefine the previous settings.

When running on a host that provides multiple OpenCL device platforms, be sure when requesting the devices with clGetDeviceIDs to specify either CL_DEVICE_TYPE_CPU (for the host CPU), CL_DEVICE_TYPE_GPU (for the GPU), or CL_DEVICE_TYPE_ACCELERATOR (for the Xeon Phi). Specifically do not rely on using CL_DEVICE_TYPE_ALL and then selecting a static device number, since this can change. It can even be different for hosts that are identically configured.

In the examples below, a simple OpenCL demo application cldemo.c is used that displays the platforms found, selects one, and scales an array of integers by a factor of two using OpenCL.

Nvidia

The Nvidia OpenCL implementation supports only Nvidia GPUs. It can be used as follows:


$ module load opencl-nvidia/8.0
$ gcc -I$OPENCL_INCLUDE -c cldemo.c
$ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-nvidia
$ cat cldemo-nvidia.job
#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1
#SBATCH -C GTX480
#SBATCH --gres=gpu:1

. /etc/bashrc
. /etc/profile.d/modules.sh
module load opencl-nvidia/8.0

./cldemo-nvidia

$ sbatch cldemo-nvidia.job; squeue
Submitted batch job 2707
      JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       2707      defq cldemo-n   versto  R       0:00      1 node026

$ cat slurm-2707.out 
=== 1 OpenCL platform(s) found: ===
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 CUDA 8.0.0
  NAME = NVIDIA CUDA
  VENDOR = NVIDIA Corporation
  EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event
=== 1 OpenCL device(s) found on platform:
  -- 0 --
  DEVICE_NAME = GeForce GTX480
  DEVICE_VENDOR = NVIDIA Corporation
  DEVICE_VERSION = OpenCL 1.2 CUDA
  DRIVER_VERSION = 370.28
  DEVICE_MAX_COMPUTE_UNITS = 24
  DEVICE_MAX_CLOCK_FREQUENCY = 1076
  DEVICE_GLOBAL_MEM_SIZE = 12799180800
Using device 0
Result: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 ...
Alternatively, the same can be accomplished using this prun command:
$ prun -np 1 -native '-C GTX480 --gres=gpu:1' ./cldemo-nvidia

Note that the Nvidia OpenCL implementation also imports the Nvidia CUDA environment.


AMD

The AMD OpenCL implementation supports both AMD GPUs and regular host CPUs. It can be used as follows:


$ module load opencl-amd
$ gcc -I$OPENCL_INCLUDE -c cldemo.c
$ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-amd
$ cat cldemo-amd.job
#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1

. /etc/bashrc
. /etc/profile.d/modules.sh
module load opencl-amd

./cldemo-amd

$ sbatch cldemo-amd.job; squeue

Intel

Most Intel OpenCL implementation supports regular host CPUs (currently version 5.0 and 16.0); version 4.5-mic supports both host CPUs and the Xeon Phi. It can be used as follows:


$ module load opencl-intel/4.5-mic
$ gcc -I$OPENCL_INCLUDE -c cldemo.c
$ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-intel
$ cat cldemo-intel.job
#!/bin/sh
#$ -S /bin/sh
#$ -l h_rt=00:10:00
#$ -j y
#$ -cwd

. /etc/bashrc
module load opencl-intel/4.5-mic

./cldemo-intel

$ sbatch cldemo-intel.job; squeue