Tag Archives: parallel application performance analysis

Arbor

ArborArbor is a software library designed from the ground up for simulators of large networks of multi-compartment neurons on hybrid/accelerated/many core computer architectures.

Performance portability was completed for the three main target HPC architectures available through the HBP: Intel x86 CPUs (AVX2 and AVX512), Intel KNL (AVX512) and NVIDIA GPUs (CUDA).

Optimized kernels are automatically generated to target each architecture, and the system used in Arbor can be extended to new architectures in the future.

The other enhancements and features implemented in Arbor are:

  • Fully parallelized event generation and queuing from spikes.
  • Efficient sampling of model state on CPU and GPU implementations, e.g. voltage and current.
  • Significant refactoring to prepare the code for general release.
  • A Python interface for users.

The source code was released publicly on GitHub with an open source BSD license, along with documentation on Read the Docs, and automatic testing was set up on Travis CI.

Date of releaseAugust 2019
Version of software0.2.1
Version of documentation0.2.1
Software availablehttps://github.com/eth-cscs/arbor
Documentationhttp://arbor.readthedocs.io
https://github.com/eth-cscs/arbor
ResponsibleBenjamin Cumming (ETHZ): bcumming@cscs.ch
Alexander Peyser (JUELICH): a.peyser@fz-juelich.de
Requirements & dependencies
Target system(s)

Score-P: HPC Performance Instrumentation and Measurement Tool

The development of Score-P was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.


Score-P logo

 

The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event tracing, and online analysis of HPC applications. Score-P is developed under a BSD 3-Clause (Open Source) License and governed by a meritocratic governance model.

Score-P offers the user a maximum of convenience by supporting a number of analysis tools. Currently, it works with Periscope, Scalasca, Vampir, and Tau and is open for other tools. Score-P comes together with the new Open Trace Format Version 2, the Cube4 profiling format and the Opari2 instrumenter.

Score-P is part of a larger set of tools for parallel performance analysis and debugging developed by the “Virtual Institute – High Productivity Supercomputing” (VI-HPS) consortium. Further documentation, training and support are available through VI-HPS.

The new version 1.4.2 provides the following new features (externally funded) as compared to version 1.4 that was part of the HBP-internal Platform Release in M18:

  • Power8, ARM64, and Intel Xeon Phi support
  • Pthread and OpenMP tasking support
  • Prototype OmpSs support
Date of releaseFebruary 2014
Version of software1.4.2
Version of documentation1.x
Software availablehttp://www.score-p.org, Section “Download section”
Documentationhttp://www.score-p.org, Section “Documentation”,
ResponsibleScore-P consortium: support@score-p.org
Requirements & dependenciesSupported OS: Linux
Needs OTF2 1.5.x series, Cube 4.3 series, and OPARI2 1.1.2 software packages (available at same website)
Target system(s)Supercomputers (Cray, IBM BlueGene, Fujitsu K/FX10), Linux Clusters of all kinds, Linux Workstations or Laptops (for test/training)

Scalasca: HPC Performance Trace Analyzer

The development of Scalasca was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.


Scalasca logo

Scalasca is a software tool that supports the performance optimisation of parallelprograms by measuring and analysing their runtime behaviour. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes.

Scalasca targets mainly scientific and engineering applications based on the programming interfaces MPI and OpenMP, including hybrid applications based on a combination of the two. The tool has been specifically designed for use on large-scale systems including IBM Blue Gene and Cray XT, but is also well suited for small- and medium-scale HPC platforms. The software is available for free download under the New BSD open-source license.

Scalasca is part of a larger set of tools for parallel performance analysis and debugging developed by the “Virtual Institute – High Productivity Supercomputing” (VI-HPS) consortium. Further documentation, training and support are available through VI-HPS.

The new version 2.2.2 provides the following new features (externally funded) as compared to version 2.2 that was part of the HBP-internal Platform Release in M18:

  • Power8, ARM64, and Intel Xeon Phi support
  • Pthread and OpenMP tasking support
  • Improved analysis
  • Prototype OmpSs support
Date of releaseJanuary 2015
Version of software2.2.2
Version of documentation2.x
Software availablehttp://www.scalasca.org/software/scalasca-2.x/download.html
Documentationhttp://www.scalasca.org/software/scalasca-2.x/documentation.html
ResponsibleScalasca team: scalasca@fz-juelich.de
Requirements & dependenciesSupported OS: Linux
Needs Score-P v1.2 or newer and Cube library v4.3 software packages
Target system(s)Supercomputers (Cray, IBM BlueGene, Fujitsu K/FX10),w Linux Clusters of all kinds, Linux Workstations or Laptops (for test/training)

Paraver

The development of Paraver was co-funded by the HBP Ramp-up Phase. This page is kept for reference but will no longer be updated.


Paraver is a very flexible data browser. The metrics used are not hardwired on the tool but can be programmed. To compute them, the tool offers a large set of time functions, a filter module, and a mechanism to combine two timelines. This approach allows displaying a huge number of metrics with the available data. The analysis display allows computing statistics over any timeline and selected region, what allows correlating the information of up to three different time functions. To capture the expert’s knowledge, any view or set of views can be saved as a Paraver configuration file. Therefore, re-computing the same view with new data is as simple as loading the saved file. The tool has been demonstrated to be very useful for performance analysis studies, giving much more details about the applications behaviour than most performance tools available.

Screenshot of Paraver
Screenshot of Paraver

The new version 4.6.0 (3rd February 2016) provides the following new features (externally funded) as compared to version 4.5.6 (February 2015) that was part of the HBP-internal Platform Release in M18:

  • Automatic workspaces on trace loading
  • Scalability improvements for traces with more than 64K rows
  • Support for wxWidgets 3
  • Traces with same hierarchy can be combined to analyze
  • External tools integration

The new version 4.6.3 (16th November 2016) provides the following new features:

  • Added punctual information view to timelines
  • Added external tool Run->Spectral from timelines
  • Trace load time reduced by 25%
  • Histogram new features: show only totals and short/long column labels
  • Run app dialog usability improvements
Date of release16th of November 2016
Version of software4.6.3
Version of documentation3.1 (Old, year 2001) But Tutorials available for newer versions
Software availablehttps://tools.bsc.es/downloads
DocumentationParaver website: https://tools.bsc.es/paraver
Documentation: https://tools.bsc.es/tools_manuals
ResponsibleBSC Performance Tools Group: tools@bsc.es
Requirements & dependenciesBoost >= 1.36; Zlib; wxWidgets >= 2.8.0; wxPropertyGrid >= 1.4.0
Target system(s)Any Unix/Linux system (supercomputers, clusters, servers, workstations, laptops, …)

Cube: Score-P / Scalasca

The development of CUBE: Score-P / Scalasca was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.


Cube, which is used as performance report explorer for Scalasca and Score-P, is a generic tool for displaying a multi-dimensional performance space consisting of the dimensions

  1. Performance metric,
  2. Call path, and
  3. System resource.

Each dimension can be represented as a tree, where non-leaf nodes of the tree can be collapsed or expanded to achieve the desired level of granularity. In addition, Cube can display multi-dimensional Cartesian process topologies.

The Cube 4.x series report explorer and the associated Cube4 data format is provided for Cube files produced with the Score-P performance instrumentation and measurement infrastructure or with Scalasca version 2.x trace analyzer (and other compatible tools). However, for backwards compatibility, Cube 4.x can also read and display Cube 3.x data.

Cube is part of a larger set of tools for parallel performance analysis and debugging developed by the “Virtual Institute – High Productivity Supercomputing” (VI-HPS) consortium. Further documentation, training and support are available through VI-HPS:

Cube screenshot
Screenshot of Cube: Score-P/Scalasca

The new version 4.3.3 provides the following new features (externally funded) as compared to version 4.3.1 that was part of the HBP-internal Platform Release in M18:

  • Derived metrics support
  • Visual plugins,
  • Improved performance and scalability
Date of releaseApril 2015
Version of software4.3.3
Version of documentation4.x
Software availablehttp://www.scalasca.org/software/cube-4.x/download.html
Documentationhttp://www.scalasca.org/software/cube-4.x/documentation.html
ResponsibleScalasca team: scalasca@fz-juelich.de
Requirements & dependenciesSupported OS: Linux, Windows
Qt
Target system(s)Linux Workstations or Laptops

Extrae

The development of Extrae was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.


Extrae is an instrumentation and measurement system gathering time stamped information of the events of an application. It is the package devoted to generate Paraver trace files for a post-mortem analysis of a code run. It uses different interposition mechanisms to inject probes into the target application in order to gather information about the application performance.

The new version 3.2.1 (3rd November 2015) provides the following new features as compared to version 3.1.0 that was part of the HBP-internal Platform Release in M18:

  • Support for MPI3 immediate collectives
  • Use Intel PEBS to sample memory references.

The new version 3.4.1 (23th September 2016) provides the following new features:

  • Extended Java support through AspectJ and JVMTI
  • Improved CUDA and OpenCL support
  • Improved support for MPI-IO operations
  • Added instrumentation for system I/O and other system calls
  • Added support for OMPT
  • Added support for IBM Platform MPI
  • Added instrumentation for memkind allocations
  • Many other small improvements and bug fixes
Date of release23 September 2016
Version of software3.4.1
Version of documentation3.4.1
Software availablehttps://tools.bsc.es/downloads
Documentationhttps://tools.bsc.es/tools_manuals
Extrae website: https://tools.bsc.es/extrae
ResponsibleBSC Performance Tools Group: tools@bsc.es
Requirements & dependenciesDependencies: libxml2 2.5.0; libunwind for Linux x86/x86-64/IA64/ARM.
Optional: PAPI; DynInst; liberty and libbfd; MPI; OpenMP
Target system(s)Any Unix/Linux system (supercomputers, clusters, servers, workstations, laptops …)