The development of TRANSFORMERS was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, each of them has a strength for a particular type of density ratio among the joined datasets. More generally, no single proposed method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have locally divergent data distributions.
Therefore, we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. It employs a join method based on data-oriented partitioning when joining areas of substantially different local densities, whereas it uses big partitions (as in space-oriented partitioning) when the densities are similar, while seamlessly switching among these two strategies at runtime.
The development of RUBIK was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
An increasing number of applications from finance, meteorology, science and others are producing time series as output. The analysis of the vast amount of time series is key to understand the phenomena studied, particularly in the simulation sciences, where the analysis of time series resulting from simulation allows scientists to refine the model simulated. Existing approaches to query time series typically keep a compact representation in main memory, use it to answer queries approximately and then access the exact time series data on disk to validate the result. The more precise the in-memory representation, the fewer disk accesses are needed to validate the result. With the massive sizes of today’s datasets, however, current in-memory representations oftentimes no longer fit into main memory. To make them fit, their precision has to be reduced considerably resulting in substantial disk access which impedes query execution today and limits scalability for even bigger datasets in the future. In this paper we develop RUBIK, a novel approach to compressing and indexing time series. RUBIK exploits that time series in many applications and particularly in the simulation sciences are similar to each other. It compresses similar time series, i.e., observation values as well as time information, achieving better space efficiency and improved precision. RUBIK translates threshold queries into two dimensional spatial queries and efficiently executes them on the compressed time series by exploiting the pruning power of a tree structure to find the result, thereby outperforming the state-of-the-art by a factor of between 6 and 23. As our experiments further indicate, exploiting similarity within and between time series is crucial to make query execution scale and to ultimately decouple query execution time from the growth of the data (size and number of time series).
The development of MonetDB was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
When a database grows into millions of records spread over many tables and business intelligence or science becomes the prevalent application domain, a column-store database management system (DBMS) is called for. Unlike traditional row-stores, such as MySQL and PostgreSQL, a column-store provides a modern and scalable solution without calling for substantial hardware investments.
MonetDB pioneered column-store solutions for high-performance data warehouses for business intelligence and eScience since 1993. It achieves its goal by innovations at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, modern CPU-tuned query execution architecture, automatic and adaptive indices, run-time query optimization, and a modular software architecture. It is based on the SQL 2003 standard with full support of foreign keys, joins, views, triggers, and stored procedures. It is fully ACID compliant and supports a rich spectrum of programming interfaces (JDBC, ODBC, PHP, Python, RoR, C/C++, Perl).
The current version provides the following new features as compared to the version that was part of the HBP-internal Platform Release in M18:
The development of SCOUT was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
SCOUT is a structure-aware method for prefetching data along interactive spatial query sequences. Given the user input, which is a spatial range query sequence representing the structure explored (interactively) by the user, and the spatial dataset to be queried, SCOUT reduces the query response time by prefetching the data along the query sequence.
Similarly to FLAT, both the query ranges in the query sequence and the spatial objects should be represented using a minimum bounding rectangle.
SCOUT outperforms the related prefetching techniques (e.g., Straight Line Extrapolation or Hilbert prefetching) with high prefetching accuracy, which is translated to one order of magnitude speedup.
Date of release
Version of software
Version of documentation
Collaboratory, integrated in and part of BBP SDK tool set
The development of FLAT was co-funded by the HBP during the Ramp-up Phase. This page is kept for reference but will no longer be updated.
FLAT is a spatial indexing tool, which enables scalable range queries on (3D) spatial datasets. Given the user input, which should be a query range, and the dataset to be queried, FLAT returns all the objects that intersect with the query range.
In particular, both the query ranges and the spatial objects should be represented using minimum bounding rectangle, which is the geometry approximation bounding the underlying spatial object.
FLAT outperforms the state-of-the-art spatial indexing techniques (e.g. R-trees, grid file) on extremely dense datasets.
Date of release
Version of software
Version of documentation
Collaboratory, integrated and part of BBP SDK tool set