Announcements

4 April 2018

The DAS-4 clusters have been updated to CentOS 7, and the Nvidia CUDA development kit has been updated to version 8.0 (version 9.1 is also available for recent GPUs). This update makes DAS-4 software-compatible with DAS-5.

6 Jul 2015

DAS-5 is now fully operational! To make room for DAS-5, DAS-4/UvA and DAS-4/ASTRON have been decomissioned, only their headnodes remain available.

25 April 2013

Slides of the DAS-4 workshop presentations are now available.

Hadoop on DAS-4

Hadoop is the open source implementation of the popular Map/Reduce paradigm for scalable distributed computing.

Hadoop performance and scaling experiments

Users on DAS-4 can run their own Hadoop setup inside regular DAS-4 jobs to experiment with e.g. performance and scaling. For large-scale experiments involving lots of data it is then advisable to configure Hadoop such that HDFS data is written to the local disks (/local/$USER directory) in combination with a particular node allocation so that the multiple experiments can make use of the same dataset.

General purpose Hadoop usage

For users on DAS-4 that want to use Hadoop as a basic tool to get their MapReduce jobs done, DAS-4/VU currently supports a Hadoop 2.5.0 configuration that is available for production use. This Hadoop configuration is running on nodes node086-node093; see the special nodes page for details.

To use the DAS-4/VU Hadoop cluster, request das-account@cs.vu.nl to create an HDFS home directory for you. The necessary settings can be imported by means of

module load hadoop

The job tracker is available from fs0.das4.cs.vu.nl via fs0.das4.cs.vu.nl:8088, while the HDFS namenode is available via master.ib.cluster:50270. The other Hadoop settings used in the DAS-4/VU setup can be found in

/cm/shared/package/hadoop/hadoop-2.5.0/etc/hadoop