9 research outputs found

    Developing a model and a language to identify and specify the integrity constraints in spatial datacubes

    Get PDF
    La qualité des données dans les cubes de données spatiales est importante étant donné que ces données sont utilisées comme base pour la prise de décision dans les grandes organisations. En effet, une mauvaise qualité de données dans ces cubes pourrait nous conduire à une mauvaise prise de décision. Les contraintes d'intégrité jouent un rôle clé pour améliorer la cohérence logique de toute base de données, l'un des principaux éléments de la qualité des données. Différents modèles de cubes de données spatiales ont été proposés ces dernières années mais aucun n'inclut explicitement les contraintes d'intégrité. En conséquence, les contraintes d'intégrité de cubes de données spatiales sont traitées de façon non-systématique, pragmatique, ce qui rend inefficace le processus de vérification de la cohérence des données dans les cubes de données spatiales. Cette thèse fournit un cadre théorique pour identifier les contraintes d'intégrité dans les cubes de données spatiales ainsi qu'un langage formel pour les spécifier. Pour ce faire, nous avons d'abord proposé un modèle formel pour les cubes de données spatiales qui en décrit les différentes composantes. En nous basant sur ce modèle, nous avons ensuite identifié et catégorisé les différents types de contraintes d'intégrité dans les cubes de données spatiales. En outre, puisque les cubes de données spatiales contiennent typiquement à la fois des données spatiales et temporelles, nous avons proposé une classification des contraintes d'intégrité des bases de données traitant de l'espace et du temps. Ensuite, nous avons présenté un langage formel pour spécifier les contraintes d'intégrité des cubes de données spatiales. Ce langage est basé sur un langage naturel contrôlé et hybride avec des pictogrammes. Plusieurs exemples de contraintes d'intégrité des cubes de données spatiales sont définis en utilisant ce langage. Les designers de cubes de données spatiales (analystes) peuvent utiliser le cadre proposé pour identifier les contraintes d'intégrité et les spécifier au stade de la conception des cubes de données spatiales. D'autre part, le langage formel proposé pour spécifier des contraintes d'intégrité est proche de la façon dont les utilisateurs finaux expriment leurs contraintes d'intégrité. Par conséquent, en utilisant ce langage, les utilisateurs finaux peuvent vérifier et valider les contraintes d'intégrité définies par l'analyste au stade de la conception

    Integration of heterogeneous multidimensional data marts

    Get PDF
     Data analysts often require access to integrated multidimensional data from local and external data warehouses. The integration process is often undertaken by expert database practitioners who will need to analyze the structure of the data, and match schemas and data before creating an integrated view of the data for visualization and analysis. Such a manual process may be acceptable for databases used in transaction processing applications but does not help decision makers who need access to the information quickly and cost effective in a constantly changing environment. This thesis addresses several challenges towards automating the integration of data warehouses based on a dimensional model known as Star schema. We recognize that the structure of multidimensional data, namely dimension hierarchies, is critical to the accuracy of the integration but is not always available or accessible. To address this problem, we infer dimension hierarchies from their instances, and demonstrate that they are sufficient to ensure the accuracy of the integration even though they may vary from the intended hierarchies. To improve the accuracy of matching Star schemas, we propose a more precise representation of Star schemas and demonstrate its effectiveness by comparing it against the existing approaches that treat Star schemas as relational models. To match instances of dimensions, we demonstrate that a graph matching algorithm is effective and performs with a high level of accuracy. We propose algorithms which enforce the tree structure of integrated data which is necessary for correct aggregation, and reduce false positive cases occurring during the instance matching. The effectiveness of our algorithms is shown through experiments with real life data. Despite perfectly matching schemas and hierarchies, there are often dimensions with mismatching data which restrict the scope of the integration. We propose to relax the requirement for dimension compatibility, and introduce measures that quantify the loss of data resulting from the less strict requirement. These measures enable data analysts to identify lossless fragments of data, and thereby, extend the scope of the integrated data. To provide a more comprehensive view of data for analysis, we link the integrated data with the data exclusive to each source by extending the navigation operation for multidimensional data. These contributions help towards shifting the integration problem away from expert database practitioners to empowered data analysts in combining multidimensional data from multiple sources in real time, and in a cost effective manner

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    SOLAP+: extending the interaction model

    Get PDF
    Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfillment of the requirements for the degree of Master in Computer ScienceDecision making is a crucial process that can dictate success or failure in today’s businesses and organizations. Decision Support Systems (DSS) are designed in order to help human users with decision making activities. Inside the big family of DSSs there is OnLine Analytical Processing (OLAP) - an approach to answer multidimensional queries quickly and effectively. Even though OLAP is recognized as an efficient technique and widely used in mostly every area, it does not offer spatial analysis, spatial data visualization nor exploration. Geographic Information Systems (GIS) had a huge growth in the last years and acquiring and storing spatial data is easier than ever. In order to explore this potential and include spatial data and spatial analysis features to OLAP, Bédard introduced Spatial OLAP (SOLAP). Although it is a relatively new area, many proposals towards SOLAP’s standardization and consolidation have been made,as well as functional tools for different application areas. There are however many issues and topics in SOLAP that are either not covered or with incompatible/non general proposals. We propose to define a generic model for SOLAP interaction based on previous works, extending it to include new visualization options,components and cases; create and present a component-driven architecture proposal for such a tool, including descriptive metamodels, aggregate navigator to increase perfomance and a communication protocol; finally, develop an example prototype that partially implements the proposed interaction features, taking into consideration guidelines for a user friendly, yet powerful and flexible application

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    Designing Cross-Company Business Intelligence Networks

    Get PDF
    Business Intelligence (BI) ist der allgemein akzeptierte Begriff für Methoden, Konzepte und Werkzeuge zur Sammlung, Aufbereitung, Speicherung, Verteilung und Analyse von Daten für Management- und Geschäftsentscheidungen. Obwohl unternehmensübergreifende Kooperation in den vergangenen Jahrzehnten stets an Einfluss gewonnen hat, existieren nur wenige Forschungsergebnisse im Bereich unternehmensübergreifender BI. Die vorliegende Arbeit stellt eine Arbeitsdefinition des Begriffs Cross-Company BI (CCBI) vor und grenzt diesen von gemeinschaftlicher Entscheidungsfindung ab. Auf Basis eines Referenzmodells, das existierende Arbeiten und Ansätze verwandter Forschungsbereiche berücksichtigt, werden umfangreiche Simulationen und Parametertests unternehmensübergreifender BI-Netzwerke durchgeführt. Es wird gezeigt, dass eine Peer-To-Peer-basierte Gestaltung der Netzwerke leistungsfähig und kompetitiv zu existierenden zentral-fokussierten Ansätzen ist. Zur Quantifizierung der Beobachtungen werden Messgrößen geprüft, die sich aus existierenden Konzepten zur Schemaüberführung multidimensionaler Daten sowie Überlegungen zur Daten- und Informationsqualität ableiten oder entwickeln lassen.Business Intelligence (BI) is a well-established term for methods, concepts and tools to retrieve, store, deliver and analyze data for management and business purposes. Although collaboration across company borders has substantially increased over the past decades, little research has been conducted specifically on Cross-Company BI (CCBI). In this thesis, a working definition and distinction from general collaborative decision making is proposed. Based on a reference model that takes existing research and related approaches of adjacent fields into account a peer-to-peer network design is created. With an extensive simulation and parameter testing it is shown that the design proves valuable and competitive to centralized approaches and that obtaining a critical mass of participants leads to improved usefulness of the network. To quantify the observations, appropriate quality measures rigorously derived from respected concepts on data and information quality and multidimensional data models are introduced and validated

    A model-driven approach for enforcing summarizability in multidimensional modeling

    Get PDF
    The development of a data warehouse system is based on a conceptual multidimensional model, which provides a high level of abstraction in the accurate and expressive description of real-world situations. Once this model has been designed, the corresponding logical representation must be obtained as the basis of the implementation of the data warehouse according to one specific technology. However, there is a semantic gap between the dimension hierarchies modeled in a conceptual multidimensional model and its implementation. This gap particularly complicates a suitable treatment of summarizability issues, which may in turn lead to erroneous results from business intelligence tools. Therefore, it is crucial not only to capture adequate dimension hierarchies in the conceptual multidimensional model of the data warehouse, but also to correctly transform these multidimensional structures in a summarizability-compliant representation. A model-driven normalization process is therefore defined in this paper to address this summarizability-aware transformation of the dimension hierarchies in rich conceptual models.This work has been partially supported by the following projects: SERENIDAD (PEII-11-0327-7035) from Junta de Comunidades de Castilla-La Mancha (Spain) and by the MESOLAP (TIN2010-14860) project from the Spanish Ministry of Education and Science
    corecore