The HPC Hardware¶
The HPC cluster consists of a single head node, a single login node, and multiple execution nodes. The HPC Status page shows how many cores and memory is in each node and the real-time status of the nodes.
The head node is not accessable to you but it’s important. It manages the submitted jobs, working out what jobs to run on each execution node, scheduling them to run, copying your data between the login node and the execution nodes, and emailing you when the job starts or ends.
The login node is the only node that you can login to directly. From there you can submit your computation jobs. It is identical to the execution nodes so that anything you compile on the login node will run exactly the same on an execution node.
The execution nodes nodes are where your submitted jobs run. They are mostly Dell Dell PowerEdge R6625 servers with dual AMD EPYC 9004 Series with CPUs running at 3.85 GHz.
Currently we have fourteen execution nodes for compute.
- The number of cores in most nodes is 64. Total number of cores is a bit over 870.
- Most cores have 754 GB of RAM but some have 1,500 GB for applications that require more memory. Total distributed memory is about 12.6 TB.
- Most nodes have at least 11 TB of fast local attached disk for “scratch”.
- There is also 700 TB of Isilon storage shared with other eResearch infrastructure.
- Two nodes have GPUs. See details below.
The GPU nodes are Dell R740 servers and each one has two Tesla V100 GPUs. Each Tesla GPU has 32 GB of GPU memory. These nodes have CUDA version installed. The HPC Status page shows what nodes have GPUs.
The node cinode03 is a private node owned by the Centre of Inflammation group.
The node i3node01 is a private node owned by the i3 group.
You should also have a look at the HPC Hardware Layout
page which covers the storage systems and the use of the /scratch
directory.