35 research outputs found

    Efficient Evaluation of Sparse Data Cubes

    Get PDF
    Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, in-teractive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach

    Temporal and Evolving Data Warehouse Design

    Get PDF

    Efficient Evaluation of Sparse Data Cubes

    Full text link
    available at www.springerlink.com ***Note: Figures may be missing from this format of the document Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, in-teractive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach. Article

    Advance of the Access Methods

    Get PDF
    The goal of this paper is to outline the advance of the access methods in the last ten years as well as to make review of all available in the accessible bibliography methods

    Data Mining for Fog Prediction and Low Clouds Detection

    Get PDF
    his paper describes our contribution to the research of parametrized models and methods for detection and prediction of significant meteorological phenomena, especially fog and low cloud cover. The project covered methods for integration of distributed meteorological data necessary for running the prediction models, training models and then mining the data in order to be able to efficiently and quickly predict even sparsely occurring phenomena. The detection and prediction methods are based on knowledge discovery -- data mining of meteorological data using neural networks and decision trees. The mined data were mainly METAR aerodrome messages, meteorological data from specialized stations and cloud data from special airport sensors -- laser ceilometers

    Enabling instant- and interval-based semantics in multidimensional data models: the T+MultiDim Model

    Get PDF
    Time is a vital facet of every human activity. Data warehouses, which are huge repositories of historical information, must provide analysts with rich mechanisms for managing the temporal aspects of information. In this paper, we (i) propose T+MultiDim, a multidimensional conceptual data model enabling both instant- and interval-based semantics over temporal dimensions, and (ii) provide suitable OLAP (On-Line Analytical Processing) operators for querying temporal information. T+MultiDim allows one to design typical concepts of a data warehouse including temporal dimensions, and provides one with the new possibility of conceptually connecting different temporal dimensions for exploiting temporally aggregated data. The proposed approach allows one to specify and to evaluate powerful OLAP queries over information from data warehouses. In particular, we define a set of OLAP operators to deal with interval-based temporal data. Such operators allow the user to derive new measure values associated to different intervals/instants, according to different temporal semantics. Moreover, we propose and discuss through examples from the healthcare domain the SQL specification of all the temporal OLAP operators we define. (C) 2019 Elsevier Inc. All rights reserved

    Intégration holistique et entreposage automatique des données ouvertes

    Get PDF
    Statistical Open Data present useful information to feed up a decision-making system. Their integration and storage within these systems is achieved through ETL processes. It is necessary to automate these processes in order to facilitate their accessibility to non-experts. These processes have also need to face out the problems of lack of schemes and structural and sematic heterogeneity, which characterize the Open Data. To meet these issues, we propose a new ETL approach based on graphs. For the extraction, we propose automatic activities performing detection and annotations based on a model of a table. For the transformation, we propose a linear program fulfilling holistic integration of several graphs. This model supplies an optimal and a unique solution. For the loading, we propose a progressive process for the definition of the multidimensional schema and the augmentation of the integrated graph. Finally, we present a prototype and the experimental evaluations.Les statistiques présentes dans les Open Data ou données ouvertes constituent des informations utiles pour alimenter un système décisionnel. Leur intégration et leur entreposage au sein du système décisionnel se fait à travers des processus ETL. Il faut automatiser ces processus afin de faciliter leur accessibilité à des non-experts. Ces processus doivent pallier aux problèmes de manque de schémas, d'hétérogénéité structurelle et sémantique qui caractérisent les données ouvertes. Afin de répondre à ces problématiques, nous proposons une nouvelle démarche ETL basée sur les graphes. Pour l'extraction du graphe d'un tableau, nous proposons des activités de détection et d'annotation automatiques. Pour la transformation, nous proposons un programme linéaire pour résoudre le problème d'appariement holistique de données structurelles provenant de plusieurs graphes. Ce modèle fournit une solution optimale et unique. Pour le chargement, nous proposons un processus progressif pour la définition du schéma multidimensionnel et l'augmentation du graphe intégré. Enfin, nous présentons un prototype et les résultats d'expérimentations

    Semantic metadata for supporting exploratory OLAP

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Aalborg UniversitetOn-Line Analytical Processing (OLAP) is an approach widely used for data analysis. OLAP is based on the multidimensional (MD) data model where factual data are related to their analytical perspectives called dimensions and together they form an n-dimensional data space referred to as data cube. MD data are typically stored in a data warehouse, which integrates data from in-house data sources, and then analyzed by means of OLAP operations, e.g., sales data can be (dis)aggregated along the location dimension. As OLAP proved to be quite intuitive, it became broadly accepted by non-technical and business users. However, as users still encountered difficulties in their analysis, different approaches focused on providing user assistance. These approaches collect situational metadata about users and their actions and provide suggestions and recommendations that can help users' analysis. However, although extensively exploited and evidently needed, little attention is paid to metadata in this context. Furthermore, new emerging tendencies call for expanding the use of OLAP to consider external data sources and heterogeneous settings. This leads to the Exploratory OLAP approach that especially argues for the use of Semantic Web (SW) technologies to facilitate the description and integration of external sources. With data becoming publicly available on the (Semantic) Web, the number and diversity of non-technical users are also significantly increasing. Thus, the metadata to support their analysis become even more relevant. This PhD thesis focuses on metadata for supporting Exploratory OLAP. The study explores the kinds of metadata artifacts used for the user assistance purposes and how they are exploited to provide assistance. Based on these findings, the study then aims at providing theoretical and practical means such as models, algorithms, and tools to address the gaps and challenges identified. First, based on a survey of existing user assistance approaches related to OLAP, the thesis proposes the analytical metadata (AM) framework. The framework includes the definition of the assistance process, the AM artifacts that are classified in a taxonomy, and the artifacts organization and related types of processing to support the user assistance. Second, the thesis proposes a semantic metamodel for AM. Hence, Resource Description Framework (RDF) is used to represent the AM artifacts in a flexible and re-usable manner, while the metamodeling abstraction level is used to overcome the heterogeneity of (meta)data models in the Exploratory OLAP context. Third, focusing on the schema as a fundamental metadata artifact for enabling OLAP, the thesis addresses some important challenges on constructing an MD schema on the SW using RDF. It provides the algorithms, method, and tool to construct an MD schema over statistical linked open data sets. Especially, the focus is on enabling that even non-technical users can perform this task. Lastly, the thesis deals with queries as the second most relevant artifact for user assistance. In the spirit of Exploratory OLAP, the thesis proposes an RDF-based model for OLAP queries created by instantiating the previously proposed metamodel. This model supports the sharing and reuse of queries across the SW and facilitates the metadata preparation for the assistance exploitation purposes. Finally, the results of this thesis provide metadata foundations for supporting Exploratory OLAP and advocate for greater attention to the modeling and use of semantics related to metadata.El processament analític en línia (OLAP) és una tècnica àmpliament utilitzada per a l'anàlisi de dades. OLAP es basa en el model multi-dimensional (MD) de dades, on dades factuals es relacionen amb les seves perspectives analítiques, anomenades dimensions, i conjuntament formen un espai de dades n-dimensional anomenat cub de dades. Les dades MD s'emmagatzemen típicament en un data warehouse (magatzem de dades), el qual integra dades de fonts internes, les quals posteriorment s'analitzen mitjançant operacions OLAP, per exemple, dades de vendes poden ser (des)agregades a partir de la dimensió ubicació. Un cop OLAP va ser provat com a intuïtiu, va ser ampliament acceptat tant per usuaris no tècnics com de negoci. Tanmateix, donat que els usuaris encara trobaven dificultats per a realitzar el seu anàlisi, diferents tècniques s'han enfocat en la seva assistència. Aquestes tècniques recullen metadades situacionals sobre els usuaris i les seves accions, i proporcionen suggerències i recomanacions per tal d'ajudar en aquest anàlisi. Tot i ésser extensivament emprades i necessàries, poca atenció s'ha prestat a les metadades en aquest context. A més a més, les noves tendències demanden l'expansió d'ús d'OLAP per tal de considerar fonts de dades externes en escenaris heterogenis. Això ens porta a la tècnica d'OLAP exploratori, la qual es basa en l'ús de tecnologies en la web semàntica (SW) per tal de facilitar la descripció i integració d'aquestes fonts externes. Amb les dades essent públicament disponibles a la web (semàntica), el nombre i diversitat d'usuaris no tècnics també incrementa signifícativament. Així doncs, les metadades per suportar el seu anàlisi esdevenen més rellevants. Aquesta tesi doctoral s'enfoca en l'ús de metadades per suportar OLAP exploratori. L'estudi explora els tipus d'artefactes de metadades utilitzats per l'assistència a l'usuari, i com aquests són explotats per proporcionar assistència. Basat en aquestes troballes, l'estudi preté proporcionar mitjans teòrics i pràctics, com models, algorismes i eines, per abordar els reptes identificats. Primerament, basant-se en un estudi de tècniques per assistència a l'usuari en OLAP, la tesi proposa el marc de treball de metadades analítiques (AM). Aquest marc inclou la definició del procés d'assistència, on els artefactes d'AM són classificats en una taxonomia, i l'organització dels artefactes i tipus relacionats de processament pel suport d'assistència a l'usuari. En segon lloc, la tesi proposa un meta-model semàntic per AM. Així doncs, s'utilitza el Resource Description Framework (RDF) per representar els artefactes d'AM d'una forma flexible i reusable, mentre que el nivell d'abstracció de metamodel s'utilitza per superar l'heterogeneïtat dels models de (meta)dades en un context d'OLAP exploratori. En tercer lloc, centrant-se en l'esquema com a artefacte fonamental de metadades per a OLAP, la tesi adreça reptes importants en la construcció d'un esquema MD en la SW usant RDF. Proporciona els algorismes, mètodes i eines per construir un esquema MD sobre conjunts de dades estadístics oberts i relacionats. Especialment, el focus rau en permetre que usuaris no tècnics puguin realitzar aquesta tasca. Finalment, la tesi tracta amb consultes com el segon artefacte més rellevant per l'assistència a usuari. En l'esperit d'OLAP exploratori, la tesi proposa un model basat en RDF per consultes OLAP instanciant el meta-model prèviament proposat. Aquest model suporta el compartiment i reutilització de consultes sobre la SW i facilita la preparació de metadades per l'explotació de l'assistència. Finalment, els resultats d'aquesta tesi proporcionen els fonaments en metadades per suportar l'OLAP exploratori i propugnen la major atenció al model i ús de semàntica relacionada a metadades.On-Line Analytical Processing (OLAP) er en bredt anvendt tilgang til dataanalyse. OLAP er baseret på den multidimensionelle (MD) datamodel, hvor faktuelle data relateres til analytiske synsvinkler, såkaldte dimensioner. Tilsammen danner de et n-dimensionelt rum af data kaldet en data cube. Multidimensionelle data er typisk lagret i et data warehouse, der integrerer data fra forskellige interne datakilder, og kan analyseres ved hjælp af OLAPoperationer. For eksempel kan salgsdata disaggregeres langs sted-dimensionen. OLAP har vist sig at være intuitiv at forstå og er blevet taget i brug af ikketekniske og orretningsorienterede brugere. Nye tilgange er siden blevet udviklet i forsøget på at afhjælpe de problemer, som denne slags brugere dog stadig står over for. Disse tilgange indsamler metadata om brugerne og deres handlinger og kommer efterfølgende med forslag og anbefalinger, der kan bidrage til brugernes analyse. På trods af at der er en klar nytteværdi i metadata (givet deres udbredelse), har stadig ikke været meget opmærksomhed på metadata i denne kotekst. Desuden lægger nye fremspirende teknikker nu op til en udvidelse af brugen af OLAP til også at bruge eksterne og uensartede datakilder. Dette har ført til Exploratory OLAP, en tilgang til OLAP, der benytter teknologier fra Semantic Web til at understøtte beskrivelse og integration af eksterne kilder. Efterhånden som mere data gøres offentligt tilgængeligt via Semantic Web, kommer flere og mere forskelligartede ikketekniske brugere også til. Derfor er metadata til understøttelsen af deres dataanalyser endnu mere relevant. Denne ph.d.-afhandling omhandler metadata, der understøtter Exploratory OLAP. Der foretages en undersøgelse af de former for metadata, der benyttes til at hjælpe brugere, og af, hvordan sådanne metadata kan udnyttes. Med grundlag i disse fund søges der løsninger til de identificerede problemer igennem teoretiske såvel som praktiske midler. Det vil sige modeller, algoritmer og værktøjer. På baggrund af en afdækning af eksisterende tilgange til brugerassistance i forbindelse med OLAP præsenteres først rammeværket Analytical Metadata (AM). Det inkluderer definition af assistanceprocessen, en taksonomi over tilhørende artefakter og endelig relaterede processeringsformer til brugerunderstøttelsen. Dernæst præsenteres en semantisk metamodel for AM. Der benyttes Resource Description Framework (RDF) til at repræsentere AM-artefakterne på en genbrugelig og fleksibel facon, mens metamodellens abstraktionsniveau har til formål at nedbringe uensartetheden af (meta)data i Exploratory OLAPs kontekst. Så fokuseres der på skemaet som en fundamental metadata-artefakt i OLAP, og afhandlingen tager fat i vigtige udfordringer i forbindelse med konstruktionen af multidimensionelle skemaer i Semantic Web ved brug af RDF. Der præsenteres algoritmer, metoder og redskaber til at konstruere disse skemaer sammenkoblede åbne statistiske datasæt. Der lægges særlig vægt på, at denne proces skal kunne udføres af ikke-tekniske brugere. Til slut tager afhandlingen fat i forespørgsler som anden vigtig artefakt inden for bruger-assistance. I samme ånd som Exploratory OLAP foreslås en RDF-baseret model for OLAP-forespørgsler, hvor førnævnte metamodel benyttes. Modellen understøtter deling og genbrug af forespørgsler over Semantic Web og fordrer klargørelsen af metadata med øje for assistance-relaterede formål. Endelig leder resultaterne af afhandlingen til fundamenterne for metadata i støttet Exploratory OLAP og opfordrer til en øget opmærksomhed på modelleringen og brugen af semantik i forhold til metadataPostprint (published version

    Constructing an XML database of linguistics data

    Get PDF

    Multi-Objective Materialized View Selection in Data-Intensive Flows

    Get PDF
    In this thesis we present Forge, a tool for automating multi-objective materialization of intermediate results in data-intensive flows, driven by a set of different quality objectives. We report initial evaluation results, showing the feasibility and efficiency of our approach
    corecore