JURON (IBM-NVIDIA pilot)

JURON is one of the two pilot systems developed by IBM and NVIDIA in the Pre-Commercial Procurement during the HBP Ramp-up Phase. It is located at Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Germany.

The Pre-Commercial Procurement (PCP) was focused on the areas dense memory integration, scalable visualization and dynamic resource management. IBM and NVIDIA addressed these topics the following way:

  • Dense memory integration: DSA-attached NVRAM at node-level
  • Scalable visualization: Each node is also a visualization node
  • Dynamic resource management: Integration with LSF

The key technologies used for the pilot system are

  • POWER8′ + NVIDIA Tesla P100 interconnected via NVLink
  • 100 Gbps network technology (InfiniBand EDR)

Both PCP pilot systems are installed at Jülich Supercomputing Centre and integrated into the data infrastructure: The GPFS cluster JUST is accessible from all nodes. A 10 Gigabit-Ethernet connectivity for remote visualization is available.

Tests and performance explorations were performed on the pilot systems JULIA and JURON installed at Jülich Supercomputing Centre (JUELICH-JSC). JULIA is a Cray CS-400 system with four DataWarp nodes integrated, each equipped with two Intel P3600 NVMe drives. These and the compute nodes were integrated in an Omnipath network. Ceph was deployed on the DataWarp nodes. The other system, JURON, is based on IBM Power S822LC HPC (“Minsky”) servers. Each server comprises a HGST Ultrastar SN100 card. On this system BeeGFS, DSS and different key value stores were deployed.

The Pilot systems are integrated into the HBP HPAC Platform AAI.

JURON is available since October 2016.

For more information and for getting access to JURON, please visit https://trac.version.fz-juelich.de/hbp-pcp/wiki/Public

BeeGFS on JURON

Two partitions of the local SSDs on each node of JURON are used to create a parallel BeeGFS file system to share the NVME disks across all nodes. These parallel file system can be accessed on all nodes of JURON, allowing equal access to the data. These accesses are almost as fast as the accesses to the local NVME disk when transferring data in larger chunks, but data can be shared across the nodes. This allows an easier sharing of data in a parallel program, running on more than one node and between different jobs of a workflow.
The data can be accesses with the usual POSIX interface, so no changes in the API are required once the data are copied. Currently, the BeeGFs file system can be used by any user. The storage is mounted to the compute nodes, but not available on the login nodes. Every user can simply create files on the system. However, since the size of this shared space is limited, the users are encouraged delete their data when they are no longer needed.

More information about the setup are available here: white paper


IBM-logoNVIDIA-logo