The development of TOUCH was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
Efficient spatial joins are pivotal for many applications and particularly important for geographical information systems or for the simulation sciences where scientists work with spatial models. Past research has primarily focused on disk-based spatial joins; efficient in-memory approaches, however, are important for two reasons: a) main memory has grown so large that many datasets fit in it and b) the in-memory join is a very time-consuming part of all disk-based spatial joins. In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low. Our results show that TOUCH outperforms known in-memory spatial-join algorithms as well as in-memory implementations of disk-based join approaches. In particular, it has a one order of magnitude advantage over the memory-demanding state of the art in terms of number of comparisons (i.e., pairwise object comparisons), as well as execution time, while it is two orders of magnitude faster when compared to approaches with a similar memory footprint. Furthermore, TOUCH is more scalable than competing approaches as data density grows.
The development of TRANSFORMERS was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, each of them has a strength for a particular type of density ratio among the joined datasets. More generally, no single proposed method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have locally divergent data distributions.
Therefore, we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. It employs a join method based on data-oriented partitioning when joining areas of substantially different local densities, whereas it uses big partitions (as in space-oriented partitioning) when the densities are similar, while seamlessly switching among these two strategies at runtime.
The development of RUBIK was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
An increasing number of applications from finance, meteorology, science and others are producing time series as output. The analysis of the vast amount of time series is key to understand the phenomena studied, particularly in the simulation sciences, where the analysis of time series resulting from simulation allows scientists to refine the model simulated. Existing approaches to query time series typically keep a compact representation in main memory, use it to answer queries approximately and then access the exact time series data on disk to validate the result. The more precise the in-memory representation, the fewer disk accesses are needed to validate the result. With the massive sizes of today’s datasets, however, current in-memory representations oftentimes no longer fit into main memory. To make them fit, their precision has to be reduced considerably resulting in substantial disk access which impedes query execution today and limits scalability for even bigger datasets in the future. In this paper we develop RUBIK, a novel approach to compressing and indexing time series. RUBIK exploits that time series in many applications and particularly in the simulation sciences are similar to each other. It compresses similar time series, i.e., observation values as well as time information, achieving better space efficiency and improved precision. RUBIK translates threshold queries into two dimensional spatial queries and efficiently executes them on the compressed time series by exploiting the pruning power of a tree structure to find the result, thereby outperforming the state-of-the-art by a factor of between 6 and 23. As our experiments further indicate, exploiting similarity within and between time series is crucial to make query execution scale and to ultimately decouple query execution time from the growth of the data (size and number of time series).
The development of neuroFiReS was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
neuroFiReS is a library for performing search and filtering operations using both data contents and metadata. These search operations will be tightly coupled with visualization in order to improve insight gaining from complex data. A first prototype (named spineRet) for searching and filtering over segmented spine data has been developed.
The development of MonetDB was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
When a database grows into millions of records spread over many tables and business intelligence or science becomes the prevalent application domain, a column-store database management system (DBMS) is called for. Unlike traditional row-stores, such as MySQL and PostgreSQL, a column-store provides a modern and scalable solution without calling for substantial hardware investments.
MonetDB pioneered column-store solutions for high-performance data warehouses for business intelligence and eScience since 1993. It achieves its goal by innovations at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, modern CPU-tuned query execution architecture, automatic and adaptive indices, run-time query optimization, and a modular software architecture. It is based on the SQL 2003 standard with full support of foreign keys, joins, views, triggers, and stored procedures. It is fully ACID compliant and supports a rich spectrum of programming interfaces (JDBC, ODBC, PHP, Python, RoR, C/C++, Perl).
The current version provides the following new features as compared to the version that was part of the HBP-internal Platform Release in M18:
The development of SCOUT was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
SCOUT is a structure-aware method for prefetching data along interactive spatial query sequences. Given the user input, which is a spatial range query sequence representing the structure explored (interactively) by the user, and the spatial dataset to be queried, SCOUT reduces the query response time by prefetching the data along the query sequence.
Similarly to FLAT, both the query ranges in the query sequence and the spatial objects should be represented using a minimum bounding rectangle.
SCOUT outperforms the related prefetching techniques (e.g., Straight Line Extrapolation or Hilbert prefetching) with high prefetching accuracy, which is translated to one order of magnitude speedup.
Date of release
March 2015
Version of software
1.0
Version of documentation
1.0
Software available
Collaboratory, integrated in and part of BBP SDK tool set
The development of Remote Connection Manager (RCM) was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
The Remote Connection Manager (RCM) is an application that allows HPC users to perform remote visualisation on Cineca HPC clusters.
The “Remote Connection Manager” works on the following operating systems: Windows, Linux, Mac OSX
(OSX Mountain Lion users need to install XQuartz: http://xquartz.macosforge.org/landing/)