19,486 research outputs found

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    A Case Study on Array Query Optimisation

    Get PDF
    The development of applications involving multi-dimensional data sets on top of a RDBMS raises several difficulties that are not directly related to the scientific problem being addressed. In particular, an additional effort is needed to solve the mismatch existing between the array-based data model typical for such computations and the set-based data model provided by the RDMBS. The RAM (Relational Array Mapping) system fills this gap, silently providing a mapping layer between the two data models. As expected though, a naive implementation of such an automatic translation cannot compete with the efficiency of queries written by an experienced programmer. In order to make RAM a valid alternative to expensive and time-consuming hand-written solutions, this performance gap should be reduced. We study a real-world application aimed at the ranking of multimedia collections to assess the impact of different implementation strategies. The result of this study provides an illustrative outlook for the development of generally applicable optimisation techniques

    Cold Storage Data Archives: More Than Just a Bunch of Tapes

    Full text link
    The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention. In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community

    Automatic optimization of array queries

    Get PDF
    Non-trivial scientific applications often involve complex computations on large multi-dimensional datasets. Using relational database technology for these datasets is cumbersome since expressing the computations in terms of relational queries is difficult and time-consuming. Moreover, query optimization strategies successful in classical relational domains may not suffice when applied to the multi-dimensional array domain. The RAM (Relational Array Mapping) system hides these issues by providing a transparent mapping between the scientific problem specification and the underlying database system. This paper focuses on the RAM query optimizer which is specifically tuned to exploit the characteristics of the array paradigm. We detail how an intermediate array-algebra and several equivalence rules are used to create efficient query plans and how, with minor extensions, the optimizer can automatically parallelize array operation

    Extending functional databases for use in text-intensive applications

    Get PDF
    This thesis continues research exploring the benefits of using functional databases based around the functional data model for advanced database applications-particularly those supporting investigative systems. This is a growing generic application domain covering areas such as criminal and military intelligence, which are characterised by significant data complexity, large data sets and the need for high performance, interactive use. An experimental functional database language was developed to provide the requisite semantic richness. However, heavy use in a practical context has shown that language extensions and implementation improvements are required-especially in the crucial areas of string matching and graph traversal. In addition, an implementation on multiprocessor, parallel architectures is essential to meet the performance needs arising from existing and projected database sizes in the chosen application area. [Continues.

    Flexible and efficient IR using array databases

    Get PDF
    textabstractThe Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage
    • …
    corecore