DAS-5/VU has been extended with 2 Intel Knights Landing nodes. Check DAS-5 special nodes for an overview of all DAS-5 special nodes.
DAS-5/VU has been extended with 4 TitanX-Pascal GPUs.
DAS-5/VU has been extended with 16 GTX TitanX GPUs.
DAS-5/UvA has been extended with 4 GTX TitanX GPUs: two on both node205 and node206.
GPUs on DAS-5 (see Special Nodes) can be programmed using two paradigms: CUDA and OpenCL.
CUDA is supported by Nvidia GPUs. The current CUDA 8.0 implementation can be added to your environment as follows:
$ module load cuda80/toolkit
Documentation for writing and building CUDA applications is then available from $CUDA_INSTALL_PATH/doc/CUDA_C_Programming_Guide.pdf. A SLURM job script to submit a CUDA application on a host with a TitanX GPU could then look like this:
#!/bin/sh #SBATCH --time=00:15:00 #SBATCH -N 1 #SBATCH -C TitanX #SBATCH --gres=gpu:1 . /etc/bashrc . /etc/profile.d/modules.sh module load cuda80/toolkit ./cuda-app opts
The option "SBATCH -C TitanX" specifies a node with a TitanX GPU, while the option "SBATCH --gres=gpu:1" lets SLURM allocate the GPU for the job by setting environment parameter CUDA_VISIBLE_DEVICES to 0. Note that without the "--gres" option, SLURM by default sets CUDA_VISIBLE_DEVICES to value NoDevFiles, which causes the CUDA runtime system to ignore the GPU.
Environment modules containing settings for additional CUDA BLAS implementation (cuda80/blas), FFT implementation (cuda80/fft), and profiler (cuda80/profiler) are also available.
OpenCL is supported by Nvidia GPUs, AMD GPUs, Xeon Phis, and regular host CPUs. Three implementations are available: opencl-nvidia, opencl-amd, and opencl-intel. All OpenCL implementations employ a common libOpenCL.so dynamic library, so when switching between OpenCL implementations, be sure to use module unload to undefine the previous settings.
When running on a host that provides multiple OpenCL device platforms, be sure when requesting the devices with clGetDeviceIDs to specify either CL_DEVICE_TYPE_CPU (for the host CPU), CL_DEVICE_TYPE_GPU (for the GPU), or CL_DEVICE_TYPE_ACCELERATOR (for the Xeon Phi). Specifically do not rely on using CL_DEVICE_TYPE_ALL and then selecting a static device number, since this can change. It can even be different for hosts that are identically configured.
In the examples below, a simple OpenCL demo application cldemo.c is used that displays the platforms found, selects one, and scales an array of integers by a factor of two using OpenCL.
The Nvidia OpenCL implementation supports only Nvidia GPUs. It can be used as follows:
$ module load opencl-nvidia/8.0 $ gcc -I$OPENCL_INCLUDE -c cldemo.c $ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-nvidia $ cat cldemo-nvidia.job #!/bin/sh #SBATCH --time=00:15:00 #SBATCH -N 1 #SBATCH -C TitanX #SBATCH --gres=gpu:1 . /etc/bashrc . /etc/profile.d/modules.sh module load opencl-nvidia/8.0 ./cldemo-nvidia $ sbatch cldemo-nvidia.job; squeue Submitted batch job 2707 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2707 defq cldemo-n versto R 0:00 1 node026 $ cat slurm-2707.out === 1 OpenCL platform(s) found: === -- 0 -- PROFILE = FULL_PROFILE VERSION = OpenCL 1.2 CUDA 8.0.0 NAME = NVIDIA CUDA VENDOR = NVIDIA Corporation EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event === 1 OpenCL device(s) found on platform: -- 0 -- DEVICE_NAME = GeForce GTX TITAN X DEVICE_VENDOR = NVIDIA Corporation DEVICE_VERSION = OpenCL 1.2 CUDA DRIVER_VERSION = 370.28 DEVICE_MAX_COMPUTE_UNITS = 24 DEVICE_MAX_CLOCK_FREQUENCY = 1076 DEVICE_GLOBAL_MEM_SIZE = 12799180800 Using device 0 Result: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 ...Alternatively, the same can be accomplished using this prun command:
$ prun -np 1 -native '-C TitanX --gres=gpu:1' ./cldemo-nvidia
Note that the Nvidia OpenCL implementation also imports the Nvidia CUDA environment.
The AMD OpenCL implementation supports both AMD GPUs and regular host CPUs. It can be used as follows:
$ module load opencl-amd $ gcc -I$OPENCL_INCLUDE -c cldemo.c $ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-amd $ cat cldemo-amd.job #!/bin/sh #SBATCH --time=00:15:00 #SBATCH -N 1 . /etc/bashrc . /etc/profile.d/modules.sh module load opencl-amd ./cldemo-amd $ sbatch cldemo-amd.job; squeue
Most Intel OpenCL implementation supports regular host CPUs (currently version 5.0 and 16.0); version 4.5-mic supports both host CPUs and the Xeon Phi. It can be used as follows:
$ module load opencl-intel/4.5-mic $ gcc -I$OPENCL_INCLUDE -c cldemo.c $ gcc -L$OPENCL_LIB -lOpenCL cldemo.o -o cldemo-intel $ cat cldemo-intel.job #!/bin/sh #$ -S /bin/sh #$ -l h_rt=00:10:00 #$ -j y #$ -cwd . /etc/bashrc module load opencl-intel/4.5-mic ./cldemo-intel $ sbatch cldemo-intel.job; squeue