Science has driven the development of the NEST simulator for the past 20 years. Originally created to simulate the propagation of synfire chains using single-processor workstations, we have pushed NEST’s capabilities continuously to address new scientific questions and computer architectures. Prominent examples include studies on spike-timing dependent plasticity in large simulations of cortical networks, the verification of mean-field models, models of Alzheimer’s and Parkinson’s disease and tinnitus. Recent developments include a significant reduction in memory requirements, as demonstrated by a record-breaking simulation of 1.86 billion neurons connected by 11.1 trillion synapses on the Japanese K supercomputer, paving the way for brain-scale simulations.
Running on everything from laptops to the world’s largest supercomputers, NEST is configured and controlled by high-level Python scripts, while harnessing the power of C++ under the hood. An extensive testsuite and systematic quality assurance ensure the reliability of NEST.
The development of NEST is driven by the demands of neuroscience and carried out in a collaborative fashion at many institutions around the world, coordinated by the non-profit member-based NEST Initiative. NEST is released under GNU Public License version 2 or later.
How NEST has been improved in HBP
Continuous dynamics
The continuous dynamics code in NEST enables simulations of rate- based model neurons in the event-based simulation scheme of the spiking simulator NEST. The technology was included and released with NEST 2.14.0.
Furthermore, additional rate-based models for the Co-Design Project “Visuo-Motor Integration” (CDP4) have been implemented and scheduled for NEST release 2.16.0.
NESTML is a domain-specific language that supports the specification of neuron models in a precise and concise syntax, based on the syntax of Python. Model equations can either be given as a simple string of mathematical notation or as an algorithm written in the built-in procedural language. The equations are analyzed by NESTML to compute an exact solution if possible, or use an appropriate numeric solver otherwise.
This technology couples the simulation software NEST and UG4 by means of the MUSIC library. NEST can only send spike trains where spiking occurs; UG4 receives those in form of events arriving at synapses (timestamps). The time course of the extracellular potential in a cube (representing a piece of tissue) is simulated based on the arriving spike data.The evolution of the membrane potential in space and time is described by the Xylouris-Wittum model.
Link to this release (2017): https://github.com/UG4
The development of PyCOMPSs was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated, apart from release notes.
PyCOMPSs is the Python binding of COMPSs, (COMP Superscalar) a coarse-grained programming model oriented to distributed environments, with a powerful runtime that leverages low-level APIs (e.g. Amazon EC2) and manages data dependencies (objects and files). From a sequential Python code, it is able to run in parallel and distributed.
COMPSs screenshot
Releases
PyCOMPSs is based on COMPSs. COMPSs version 1.3 was released in November 2015, version 1.4 in May 2016 and version 2.0 in November 2016.
New features in COMPSs v1.3
Runtime
Persistent workers: workers can be deployed on computing nodes and persist during all the application lifetime, thus reducing the runtime overhead. The previous implementation of workers based on a per task process is still supported.
Enhanced logging system
Interoperable communication layer: different inter-nodes communication protocol is supported by implementing the Adaptor interface (JavaGAT and NIO implementations already included)
Simplified cloud connectors interface
JClouds connector
Python/PyCOMPSs
Added constraints support
Enhanced methods support
Lists accepted as a tasks’ parameter type
Support for user decorators
Tools
New monitoring tool: with new views, as workload and possibility of visualizing information about previous runs
Enhanced tracing mechanism
Simplified execution scripts
Simplified installation on supercomputers through better scripts
New features in COMPSs v1.4
Runtime
Added support for Docker
Added support for Chameleon Cloud
Object cache for persistent workers
Improved error management
Added connector for submitting tasks to MN supercomputer from external COMPSs applications
Bug-fixes
Python/PyCOMPSs
General bug-fixes
Tools
Enhanced Tracing mechanism:
Reduced overhead using native Java API
Added support for communications instrumentation added
Added support for PAPI hardware counters
Known Limitations
When executing Python applications with constraints in the cloud the initial VMs must be set to 0
New features in COMPSs v2.0 (released November 2016)
Runtime:
Upgrade to Java 8
Support to remote input files (input files already at workers)
Integration with Persistent Objects
Elasticity with Docker and Mesos
Multi-processor support (CPUs, GPUs, FPGAs)
Dynamic constraints with environment variables
Scheduling taking into account the full tasks graph (not only ready tasks)
Support for SLURM clusters
Initial COMPSs/OmpSs integration
Replicated tasks: Tasks executed in all the workers
Explicit Barrier
Python:
Python user events and HW counters tracing
Improved PyCOMPSs serialization. Added support for lambda and generator parameters.
C:
Constraints support
Tools:
Improved current graph visualization on COMPSs Monitor
Improvements:
Simplified Resource and Project files (NO retrocompatibility)
Improved binding workers execution (use pipes instead of Java Process Builders)
Simplifies cluster job scripts and supercomputers configuration
Several bug fixes
Known Limitations:
When executing python applications with constraints in the cloud the initial VMs must be set to 0
New features in PyCOMPSs/COMPSs v2.1 (released June 2017)
New features:
Runtime:
New annotations to simplify tasks that call external binaries
Integration with other programming models (MPI, OmpSs,..)
Support for Singularity containers in Clusters
Extension of the scheduling to support multi-node tasks (MPI apps as tasks)
Support for Grid Engine job scheduler in clusters
Language flag automatically inferred in runcompss script
New schedulers based on tasks’ generation order
Core affinity and over-subscribing thread management in multi-core cluster queue scripts (used with MKL libraries, for example)
Python:
@local annotation to support simpler data synchronizations in master (requires to install guppy)
Support for args and kwargs parameters as task dependencies
Task versioning support in Python (multiple behaviors of the same task)
New Python persistent workers that reduce overhead of Python tasks
Support for task-thread affinity
Tracing extended to support for Python user events and HW counters (with known issues)
C:
Extension of file management API (compss_fopen, compss_ifstream, compss_ofstream, compss_delete_file)
Support for task-thread affinity
Tools:
Visualization of not-running tasks in current graph of the COMPSs Monitor
Improvements
Improved PyCOMPSs serialization
Improvements in cluster job scripts and supercomputers configuration
Several bug fixes
Known Limitations
When executing Python applications with constraints in the cloud the <InitialVMs> property must be set to 0
Tasks that invoke Numpy and MKL may experience issues if tasks use a different number of MKL threads. This is due to the fact that MKL reuses threads in the different calls and it does not change the number of threads from one call to another.
New features in PyCOMPSs/COMPSs v2.3 (released June 2018)
Runtime
Persistent storage API implementation based on Redis (distributed as default implementation with COMPSs)
Support for FPGA constraints and reconfiguration scripts
Support for PBS Job Scheduler and the Archer Supercomputer
Java
New API call to delete objects in order to reduce application memory usage
Python
Support for Python 3
Support for Python virtual environments (venv)
Support for running PyCOMPSs as a Python module
Support for tasks returning multiple elements (returns=#)
Automatic import of dummy PyCOMPSs AP
C
Persistent worker with Memory-to-memory transfers
Support for arrays (no serialization required)
Improvements
Distribution with docker images
Source Code and example applications distribution on Github
Automatic inference of task return
Improved obsolete object cleanup
Improved tracing support for applications using persistent memory
Improved finalization process to reduce zombie processes
Several bug fixes
Known limitations
Tasks that invoke Numpy and MKL may experience issues if a different MKL threads count is used in different tasks. This is due to the fact that MKL reuses threads in the different calls and it does not change the number of threads from one call to another.
New features in PyCOMPSs/COMPSs v2.5 (released June 2019)
Runtime:
New Concurrent direction type for task parameter.
Multi-node tasks support for native (Java, Python) tasks. Previously, multi-node tasks were only posible with @mpi or @decaf tasks.
@Compss decorator for executing compss applications as tasks.
New runtime api to synchronize files without opening them.
Customizable task failure management with the “onFailure” task property.
Enabled master node to execute tasks.
Python:
Partial support of numba in tasks.
Support for collection as task parameter.
Supported task inheritance.
New persistent MPI worker mode (alternative to subprocess).
Support to ARM MAP and DDT tools (with MPI worker mode).
C:
Support for task without parameters and applications without src folder.
Improvements:
New task property “targetDirection” to indicate direction of the target object in object methods. Substitutes the “isModifier” task property.
Warnings for deprecated or incorrect task parameters.
Improvements in Jupyter for Supercomputers.
Upgrade of runcompss_docker script to docker stack interface.
Several bug fixes.
Known Limitations:
Tasks that invoke Numpy and MKL may experience issues if a different MKL threads count is used in different tasks. This is due to the fact that MKL reuses threads in the different calls and it does not change the number of threads from one call to another.
C++ Objects declared as arguments in a coarse-grain tasks must be passed in the task methods as object pointers in order to have a proper dependency management.
Master as worker is not working for executions with persistent worker in C++.
Coherence and concurrent writing in parameters annotated with the “Concurrent” direction must be managed by the underlaying distributed storage system.
Delete file calls for files used as input can produce a significant synchronization of the main code.
PyCOMPSs/COMPSs PIP installation package
This is a new feature available since January 2017.
Installation:
Check the dependencies in the PIP section of the PyCOMPSs installation manual (available at the documentation section of compss.bsc.es). Be sure that the target machine satisfies the mentioned dependencies.
The installation can be done in various alternative ways:
Use PIP to install the official PyCOMPSs version from the pypi live repository:
sudo -E python2.7 -m pip install pycompss -v
Use PIP to install PyCOMPSs from a pycompss.tar.gz
The development of Score-P was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event tracing, and online analysis of HPC applications. Score-P is developed under a BSD 3-Clause (Open Source) License and governed by a meritocratic governance model.
Score-P offers the user a maximum of convenience by supporting a number of analysis tools. Currently, it works with Periscope, Scalasca, Vampir, and Tau and is open for other tools. Score-P comes together with the new Open Trace Format Version 2, the Cube4 profiling format and the Opari2 instrumenter.
Score-P is part of a larger set of tools for parallel performance analysis and debugging developed by the “Virtual Institute – High Productivity Supercomputing” (VI-HPS) consortium. Further documentation, training and support are available through VI-HPS.
The new version 1.4.2 provides the following new features (externally funded) as compared to version 1.4 that was part of the HBP-internal Platform Release in M18:
The development of Scalasca was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
Scalasca is a software tool that supports the performance optimisation of parallelprograms by measuring and analysing their runtime behaviour. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes.
Scalasca targets mainly scientific and engineering applications based on the programming interfaces MPI and OpenMP, including hybrid applications based on a combination of the two. The tool has been specifically designed for use on large-scale systems including IBM Blue Gene and Cray XT, but is also well suited for small- and medium-scale HPC platforms. The software is available for free download under the New BSD open-source license.
Scalasca is part of a larger set of tools for parallel performance analysis and debugging developed by the “Virtual Institute – High Productivity Supercomputing” (VI-HPS) consortium. Further documentation, training and support are available through VI-HPS.
The new version 2.2.2 provides the following new features (externally funded) as compared to version 2.2 that was part of the HBP-internal Platform Release in M18:
The development of CUBE: Score-P / Scalasca was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
Cube, which is used as performance report explorer for Scalasca and Score-P, is a generic tool for displaying a multi-dimensional performance space consisting of the dimensions
Performance metric,
Call path, and
System resource.
Each dimension can be represented as a tree, where non-leaf nodes of the tree can be collapsed or expanded to achieve the desired level of granularity. In addition, Cube can display multi-dimensional Cartesian process topologies.
The Cube 4.x series report explorer and the associated Cube4 data format is provided for Cube files produced with the Score-P performance instrumentation and measurement infrastructure or with Scalasca version 2.x trace analyzer (and other compatible tools). However, for backwards compatibility, Cube 4.x can also read and display Cube 3.x data.
Cube is part of a larger set of tools for parallel performance analysis and debugging developed by the “Virtual Institute – High Productivity Supercomputing” (VI-HPS) consortium. Further documentation, training and support are available through VI-HPS:
The new version 4.3.3 provides the following new features (externally funded) as compared to version 4.3.1 that was part of the HBP-internal Platform Release in M18:
The development of Extrae was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
Extrae is an instrumentation and measurement system gathering time stamped information of the events of an application. It is the package devoted to generate Paraver trace files for a post-mortem analysis of a code run. It uses different interposition mechanisms to inject probes into the target application in order to gather information about the application performance.
The new version 3.2.1 (3rd November 2015) provides the following new features as compared to version 3.1.0 that was part of the HBP-internal Platform Release in M18:
Support for MPI3 immediate collectives
Use Intel PEBS to sample memory references.
The new version 3.4.1 (23th September 2016) provides the following new features:
Extended Java support through AspectJ and JVMTI
Improved CUDA and OpenCL support
Improved support for MPI-IO operations
Added instrumentation for system I/O and other system calls