This phase of SGA2 focussed on finalising the data location and transfer services, providing user-level documentation, and documenting the experiences made in SGA2. Together, these results provide a data federation layer for the HBP, leveraging infrastructure provided by the Fenix project, as well as mature technologies developed by HBP member sites. In particular, these are the Knowledge Graph, UNICORE, and the Fenix Archival Data Repositories.
For these services technologies were chosen that are already in production inside the HBP, which enables synergies between different teams and user communities.
Further, research into novel storage technologies and interactive data analysis has continued. This research has been made accessible to users and developers inside the HBP in a form suitable to enable them to apply our results to their projects. This has been done in two parts, one being documentation and educational material, the second the release of higher-level abstractions over these technologies, namely extensions to the SLURM resource manager and the Hecuba interface for distributed Key-Value Stores.
HBP/Fenix data transfer services enabled
The Fenix data transfer services have been enabled in order to allow HBP users to move datasets between all the HPC sites, both in “client to server” and “server to server” mode (see Figure below). The technology adopted in HBP to implement the Fenix data transfer service is the UNICORE UFTP. The UFTP protocol is a high-performance data transfer service based on FTP, developed as part of the UNICORE suite.
HBP users are able to use the UFTP service to move files using two different interfaces:
- The UFTP standalone client
A fully featured client for the UNICORE middleware, able to start both direct data transfer (user workstation to server) and third party transfers. - The UNICORE REST API
Provides an API for driving UNICORE and UFTP via HTTP verbs, which can be used from IPython Notebooks, and, thus, the Collaboratory. The pyunicore library provides an abstraction over the REST API suitable for end users.
General view of the data transfer services in the HPAC Platform and Fenix.
This figure shows the main components involved in data transfer between the sites of the HPAC Platform and the Fenix infrastructure. Server to server and client to server connections are highlighted in red and blue, respectively.
Technical documentation: https://www.unicore.eu/docstore/uftp-2.2.0/uftp-manual.html
User documentation: https://collab.humanbrainproject.eu/#/collab/3656/nav/275433
HBP/Fenix data location service
The HBP Central Data Location service provides functionality for tracking the location of data across the HBP, discovery and search of such collections, tracing of versions, and — to a limited extent – provenances to HBP users. It leverages the HBP Knowledge Graph (KG) service developed by EPFL, which offers a production-ready service for searching and linking entries loosely adhering to a collection of schemata. Using the KG as the basis of the data location service also has the benefit of automatically providing unique, persistent IDs for each data collection.
The Knowledge Graph (KG)
- KG Search UI, Anonymously accessible: https://www.humanbrainproject.eu/explore-the-brain/search
- KG Editor & Query Builder, Protected Access: https://kg.humanbrainproject.eu/editor
- KG Query API, Protected Access: https://kg.humanbrainproject.org/apidoc/index.html?url=/apispec/spring%3Fgroup%3D00_external
For access to protected/private sections, an HBP login is required, access permission can be requested at kg-team@humanbrainproject.eu.
In the near future, the integration between the HBP Central Data Location Service (CDLS) and the Fenix/HBP Archival Data Repositories (ARD) (based on OpenStack SWIFT) will be completed. This step is needed to allow HBP workflows to leverage Fenix ARD for long term storage of results cross- referenced in the CDLS. To this end, we will provide a collection of metadata schemata suited for tracking locations, types, and provenance of such data.
We demonstrate how such workflows can be built in the current state of affairs regarding authentication and federation in the “KG APIs + SWIFT integration” demo example notebook, which can be accessed by HBP users under: https://collab.humanbrainproject.eu/#/collab/49291/nav/338055
Further efforts in the direction of support libraries, mainly in the Python language, will be investigated to enable end users to leverage these services.
Further, efforts have been made to export HBP metadata from the KG via standard protocols (OAI-PMH to an external B2FIND service such that public KG data can be discovered via the EOSC portal). The documentation for the KG harvester is available at https://gitlab.version.fz- juelich.de/hater1/reaper.
Technical documentation: https://bbp-nexus.epfl.ch/staging/docs/
User documentation: https://bluebrainnexus.io/
Private Cloud On a Compute Cluster (PCOCC)
PCOCC (pronounced “peacock”: Private Cloud on a Compute Cluster) allows HPC cluster users to host their own clusters of Virtual Machines (VMs) on compute nodes alongside regular HPC jobs through SLURM. Such VMs allow users to fully customise their software environments for development and testing, or for facilitating application deployment. Compute nodes remain managed by the batch scheduler, as usual, since the clusters of VMs are treated as regular jobs. From the point of view of the batch scheduler, each VM is a task for which it allocates the requested CPUs and memory, and the resource usage is accounted to the user, just as for any other job. For each virtual cluster, PCOCC instantiates private networks isolated from the host networks, creates temporary disk images from the selected templates (using copy-on-write) and instantiates the requested VMs. PCOCC is able to run virtual clusters of several thousands of VMs and will enable different usage scenarios for compute clusters, from running complex software stacks packaged in an image, to reproducing technical issues happening at large scale without impacting production servers. PCOCC has been developed at CEA. A PCOCC version is in production, which can now support virtual machines and containers in an HPC context. This tool provides to users a single, “easy to use” interface for running a virtual HPC cluster with very good performance. The description of this tool (the full documentation is on https://pcocc.readthedocs.io).
Technical documentation: https://github.com/cea-hpc/pcocc
User documentation: https://pcocc.readthedocs.io
Data Access and Efficient IO
Task T7.2.3 has developed a series of guidelines, examples, and user-level documentation describing how to make efficient use of modern I/O technologies. Topics covered are the following:
- Using the SWIFT object storage for multi-site workflows. Educates users on the use of federated data services offered by Fenix. An experience report of the “learning to learn” project team is included, describing the construction of a federated workflow.
- Leveraging modern MPI for reducing I/O load. Demonstration of how to make best use of I/O bandwidth.
- Best practices for using HDF5 for image data. Covers the use of HDF5 in ML/DL, an important use case for the BrainAtlas among others.
- Using BeeGFS-on-demand for high-performance I/O across node-local NVMe devices. Demonstrates use of an ad hoc high-performance file system for data intensive computing.
- Usage and performance of Apache Pass Optane devices. Leveraging novel memory technologies for persistent, memory intensive workloads.
- Interactive visualisation using the Universal Data Junction. Proof of concept for coupled, lock-step simulation and visualisation.
- Hecuba as an abstraction for distributed Key Value Stores (KVS). Use Case for a high-level library leveraging industry-strength KVS for HPC.
This collection is publicly available in the Collaboratory: https://wiki.humanbrainproject.eu/bin/view/Collabs/how-to-data-access-and-efficient-io.
SLURM Co-Scheduling Plugin
Work on the development of a plug-in solution for SLURM, which couples compute and high-end storage resources such as SSD/NVM storage volumes, and adds the ability to schedule both types of resources in a way that balances system utilisation and turn-around time, was delayed due to unforeseeable circumstances. In the last months, a basic version of the SLURM plugin has been implemented and tested in a virtual cluster and is published under the GNU General Public License (GPLv3). The plugin has been tested in a virtual cluster but is not yet integrated in the HPAC Platform.
Technical documentation: https://collab.humanbrainproject.eu/#/collab/8122/nav/61636?state=uuid %3D7f322b30-49f4-473d-8e27-40c42976d1d5
User documentation: https://github.com/HumanBrainProject/coallocation-slurm- plugin/blob/master/README.md