Data-intensive applications needing HPC resources have been identified as one of the challenges of the HBP, which need to be taken into account for further evolution of the HPAC Platform and the upcoming Fenix infrastructure.
Our approach in the last years was use case driven. We have addressed the following use cases:
- Deep learning on large images
- Parallel image registration
- Cell detection and feature calculation
- Visualisation of compartment data
The work focused among others on enabling the use of new storage technologies. New storage devices based on dense and non-volatile memory technologies provide much higher performance, both in terms of bandwidth as well as IOPS (input/output operations per second) rates, while compromising on capacity. To provide access to storage devices integrated into HPC systems, like the HBP PCP Pilot systems (JULIA and JURON), a variety of technologies have been deployed, explored and enhanced, ranging from parallel file systems like BeeGFS, new object store technologies like Ceph, key value stores (KVS) enhanced by powerful indexing capabilities, to technologies like IBM’s Distributed Shared Storage-class-memory (DSS).
High-performance non-volatile memory is a precious resource, therefore, its co-allocation together with compute resources becomes necessary. Scheduling strategies for specific scenarios based on the needs of use cases have been developed and evaluated using simulators.
Specific data analytics workflows have been considered and implemented in co-design with data analytics experts.
The impact of our work
The work related to this key result laid the basis for integration of hierarchical storage architectures and related technologies within the upcoming Fenix infrastructure. Part of this work was performed in collaboration with the HBP PCP partners and ThinkParQ as a further commercial operator [1].
Many of the results produced have been deployed on the PCP pilot systems to make them available to early HBP users. This includes different storage technologies, interface libraries including the newly developed Brion I/O library and the data analytics workflows for the human brain atlas.
Results from simulations of co-allocation scheduling strategies will be used as basis for implementation efforts foreseen for the next HBP phase.
[1] See, e.g., the following White Paper: “NVMe and OpenPOWER Performance on JURON”, (https://www.beegfs.io/docs/whitepapers/JURON_OpenPOWER_NVMe_by_ThinkParQ_FZ-Juelich.pdf).