32 research outputs found

    Query optimization by using derivability in a data warehouse environment

    Get PDF
    Materialized summary tables and cached query results are frequently used for the optimization of aggregate queries in a data warehouse. Query rewriting techniques are incorporated into database systems to use those materialized views and thus avoid the access of the possibly huge raw data. A rewriting is only possible if the query is derivable from these views. Several approaches can be found in the literature to check the derivability and find query rewritings. The specific application scenario of a data warehouse with its multidimensional perspective allows the consideration of much more semantic information, e.g. structural dependencies within the dimension hierarchies and different characteristics of measures. The motivation of this article is to use this information to present conditions for derivability in a large number of relevant cases which go beyond previous approaches

    Metadaten: Schlüssel zur Nutzung von Informationssystemen

    Get PDF
    Zunehmend werden Informationssysteme aufgebaut zur Gewinnung von Informationen, die der Entscheidungsfindung dienen. Damit stellt sich jedoch die Frage, wie bei ständig zunehmendem Informationsangebot die entscheidungsrelevanten Datenbestände ausgewählt werden können. Wenn man Informationssysteme als Gesamtheit von Daten und Methoden definiert, so sind Metadaten der Bestandteil des Systems, der die dort vorgehaltenen Daten beschreibt. Sie sollen als Leitinstrument bei der Nutzung des Informationssystems fungieren. Informationssysteme dienen der Durchführung von Untersuchungen. Die Arbeitsschritte bei dieser Aufgabe werden durch das Adäquations- und Interpretationsproblem beschrieben. Metadaten sollen den Nutzer von Informationssystemen bei der Durchführung dieser Arbeitsschritte möglichst weitgehend unterstützen. Inwieweit sie bei den einzelnen Arbeitsschritten Hilfestellung geben können und wie sie für diesen Zweck aufgebaut sein sollten, wird in diesem Beitrag allgemein und am Beispiel der Sozialhilfestatistik sowie der Umweltstatistik gezeigt --

    Dynamic Time Warping Under Translation: Approximation Guided by Space-Filling Curves

    Get PDF
    The Dynamic Time Warping (DTW) distance is a popular measure of similarity for a variety of sequence data. For comparing polygonal curves π, σ in ℝ^d, it provides a robust, outlier-insensitive alternative to the Fréchet distance. However, like the Fréchet distance, the DTW distance is not invariant under translations. Can we efficiently optimize the DTW distance of π and σ under arbitrary translations, to compare the curves' shape irrespective of their absolute location? There are surprisingly few works in this direction, which may be due to its computational intricacy: For the Euclidean norm, this problem contains as a special case the geometric median problem, which provably admits no exact algebraic algorithm (that is, no algorithm using only addition, multiplication, and k-th roots). We thus investigate exact algorithms for non-Euclidean norms as well as approximation algorithms for the Euclidean norm. For the L₁ norm in ℝ^d, we provide an ????(n^{2(d+1)})-time algorithm, i.e., an exact polynomial-time algorithm for constant d. Here and below, n bounds the curves' complexities. For the Euclidean norm in ℝ², we show that a simple problem-specific insight leads to a (1+ε)-approximation in time ????(n³/ε²). We then show how to obtain a subcubic ????̃(n^{2.5}/ε²) time algorithm with significant new ideas; this time comes close to the well-known quadratic time barrier for computing DTW for fixed translations. Technically, the algorithm is obtained by speeding up repeated DTW distance estimations using a dynamic data structure for maintaining shortest paths in weighted planar digraphs. Crucially, we show how to traverse a candidate set of translations using space-filling curves in a way that incurs only few updates to the data structure. We hope that our results will facilitate the use of DTW under translation both in theory and practice, and inspire similar algorithmic approaches for related geometric optimization problems

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Designing algorithms for big graph datasets : a study of computing bisimulation and joins

    Get PDF

    Advances in Remote Sensing-based Disaster Monitoring and Assessment

    Get PDF
    Remote sensing data and techniques have been widely used for disaster monitoring and assessment. In particular, recent advances in sensor technologies and artificial intelligence-based modeling are very promising for disaster monitoring and readying responses aimed at reducing the damage caused by disasters. This book contains eleven scientific papers that have studied novel approaches applied to a range of natural disasters such as forest fire, urban land subsidence, flood, and tropical cyclones

    Query-Time Data Integration

    Get PDF
    Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections
    corecore