Parallel Query Support for Multidimensional Data

Intra-query parallelism is a well-established mechanism for achieving high performance in
(object) relational database systems. However, the methods have yet not been applied to the
upcoming field of multidimensional array databases. Specific properties of multidimensional
array data require new parallel algorithms. A number of new techniques for parallelizing
queries in multidimensional array database management systems. It discusses their
implementation in the RasDaMan DBMS, the first DBMS for generic multidimensional array data.
The efficiency of the techniques presented is demonstrated using typical queries on large
multidimensional data volumes.

Recently, integration of an application domain-independent and of a generic type constructor
for such Multidimensional Discrete Data (MDD) into Database Management Systems (DBMS) has
received growing attention. Current scientific contributions in this area mainly focus on
MDD algebra and specialized storage architectures MDD objects may have a magnitude of
several MB and much more and, compared to scalar values, operations on these values can be
very complex, their efficient evaluation becomes a critical factor for the overall query
response time. Beyond query optimization, parallel query processing is the most promising
technique to speed up complex operations on large data volumes.

One of the outcomes of the predecessor project of ESTEDI (European Spatio-Temporal Data
Infrastructure), called RasDaMan in which the Array DBMS RasDaMan has been developed, was
the awareness that most queries on multidimensional array data are in fact CPU-bound.
Therefore, one major research issue of the succeeding project ESTEDI is the parallel
processing. Furthermore, ESTEDI, an initiative of European software vendors and
supercomputing centers, will establish an European standard for the storage and retrieval
of multidimensional high-performance computing (HPC) data. It addresses a main technical
obstacle, the delivery bottleneck of large HPC results to the users, by augmenting high
volume data generators with a flexible data management and extraction tool for
multidimensional array data. Special properties of array data, e.g. the size of one
single data object combined with expensive cell operations require adapted algorithms
for parallel processing. Suitable concepts found in relational DBMS were implemented
and evaluated in the RasDaMan Array DBMS.


