Announcements

2 Jan 2017

DAS-5/VU has been extended with 4 TitanX-Pascal GPUs. Check DAS-5 special nodes for an overview of all DAS-5 special nodes.

9 Nov 2016

CUDA 8 is now available on all DAS-5 sites with GPU nodes. Check the DAS-5 GPU page for usage info.

May, 2016

IEEE Computer publishes paper about 20 years of Distributed ASCI Supercomputer. See the DAS Achievements page.

28 Sep 2015

DAS-5/VU has been extended with 16 GTX TitanX GPUs.

6 Aug 2015

DAS-5/UvA has been extended with 4 GTX TitanX GPUs: two on both node205 and node206.

6 Jul 2015

DAS-5 is fully operational!


GPU programming

GPUs on DAS-5 (see Special Nodes) can be programmed using two paradigms: CUDA and OpenCL.

CUDA

CUDA is supported by Nvidia GPUs. The current CUDA 8.0 implementation can be added to your environment as follows:


$ module load cuda80/toolkit

Documentation for writing and building CUDA applications is then available from $CUDA_INSTALL_PATH/doc/CUDA_C_Programming_Guide.pdf. A SLURM job script to submit a CUDA application on a host with a TitanX GPU could then look like this:


#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1
#SBATCH -C TitanX
#SBATCH --gres=gpu:1

. /etc/bashrc
. /etc/profile.d/modules.sh
module load cuda80/toolkit

./cuda-app opts

The option "SBATCH -C TitanX" specifies a node with a TitanX GPU, while the option "SBATCH --gres=gpu:1" lets SLURM allocate the GPU for the job by setting environment parameter CUDA_VISIBLE_DEVICES to 0. Note that without the "--gres" option, SLURM by default sets CUDA_VISIBLE_DEVICES to value NoDevFiles, which causes the CUDA runtime system to ignore the GPU.

Environment modules containing settings for additional CUDA BLAS implementation (cuda80/blas), FFT implementation (cuda80/fft), and profiler (cuda80/profiler) are also available.

OpenCL

OpenCL is supported by Nvidia GPUs, AMD GPUs, Xeon Phis, and regular host CPUs. Three implementations are available: opencl-nvidia, opencl-amd, and opencl-intel. All OpenCL implementations employ a common libOpenCL.so dynamic library, so when switching between OpenCL implementations, be sure to use module unload to undefine the previous settings.

When running on a host that provides multiple OpenCL device platforms, be sure when requesting the devices with clGetDeviceIDs to specify either CL_DEVICE_TYPE_CPU (for the host CPU), CL_DEVICE_TYPE_GPU (for the GPU), or CL_DEVICE_TYPE_ACCELERATOR (for the Xeon Phi). Specifically do not rely on using CL_DEVICE_TYPE_ALL and then selecting a static device number, since this can change. It can even be different for hosts that are identically configured.

In the examples below, a simple OpenCL demo application cldemo.c is used that displays the platforms found, selects one, and scales an array of integers by a factor of two using OpenCL.

Nvidia

The Nvidia OpenCL implementation supports only Nvidia GPUs. It can be used as follows:


$ module load opencl-nvidia/8.0
$ gcc -I$OPENCL_INCLUDE -c cldemo.c
$ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-nvidia
$ cat cldemo-nvidia.job
#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1
#SBATCH -C TitanX
#SBATCH --gres=gpu:1

. /etc/bashrc
. /etc/profile.d/modules.sh
module load opencl-nvidia/8.0

./cldemo-nvidia

$ sbatch cldemo-nvidia.job; squeue
Submitted batch job 2707
      JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       2707      defq cldemo-n   versto  R       0:00      1 node026

$ cat slurm-2707.out 
=== 1 OpenCL platform(s) found: ===
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 CUDA 8.0.0
  NAME = NVIDIA CUDA
  VENDOR = NVIDIA Corporation
  EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event
=== 1 OpenCL device(s) found on platform:
  -- 0 --
  DEVICE_NAME = GeForce GTX TITAN X
  DEVICE_VENDOR = NVIDIA Corporation
  DEVICE_VERSION = OpenCL 1.2 CUDA
  DRIVER_VERSION = 370.28
  DEVICE_MAX_COMPUTE_UNITS = 24
  DEVICE_MAX_CLOCK_FREQUENCY = 1076
  DEVICE_GLOBAL_MEM_SIZE = 12799180800
Using device 0
Result: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 ...
Alternatively, the same can be accomplished using this prun command:
$ prun -np 1 -native '-C TitanX --gres=gpu:1' ./cldemo-nvidia

Note that the Nvidia OpenCL implementation also imports the Nvidia CUDA environment.


AMD

The AMD OpenCL implementation supports both AMD GPUs and regular host CPUs. It can be used as follows:


$ module load opencl-amd
$ gcc -I$OPENCL_INCLUDE -c cldemo.c
$ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-amd
$ cat cldemo-amd.job
#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1

. /etc/bashrc
. /etc/profile.d/modules.sh
module load opencl-amd

./cldemo-amd

$ sbatch cldemo-amd.job; squeue

Intel

Most Intel OpenCL implementation supports regular host CPUs (currently version 5.0 and 16.0); version 4.5-mic supports both host CPUs and the Xeon Phi. It can be used as follows:


$ module load opencl-intel/4.5-mic
$ gcc -I$OPENCL_INCLUDE -c cldemo.c
$ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-intel
$ cat cldemo-intel.job
#!/bin/sh
#$ -S /bin/sh
#$ -l h_rt=00:10:00
#$ -j y
#$ -cwd

. /etc/bashrc
module load opencl-intel/4.5-mic

./cldemo-intel

$ sbatch cldemo-intel.job; squeue