2,366 research outputs found

    Data warehouse automation trick or treat?

    Get PDF
    Data warehousing systems have been around for 25 years playing a crucial role in collecting data and transforming that data into value, allowing users to make decisions based on informed business facts. It is widely accepted that a data warehouse is a critical component to a data-driven enterprise, and it becomes part of the organisation’s information systems strategy, with a significant impact on the business. However, after 25 years, building a Data Warehouse is still painful, they are too time-consuming, too expensive and too difficult to change after deployment. Data Warehouse Automation appears with the promise to address the limitations of traditional approaches, turning the data warehouse development from a prolonged effort into an agile one, with gains in efficiency and effectiveness in data warehousing processes. So, is Data Warehouse Automation a Trick or Treat? To answer this question, a case study of a data warehousing architecture using a data warehouse automation tool, called WhereScape, was developed. Also, a survey was made to organisations that are using data warehouse automation tools, in order to understand their motivation in the adoption of this kind of tools in their data warehousing systems. Based on the results of the survey and on the case study, automation in the data warehouses building process is necessary to deliver data warehouse systems faster, and a solution to consider when modernize data warehouse architectures as a way to achieve results faster, keeping costs controlled and reduce risk. Data Warehouse Automation definitely may be a Treat.Os sistemas de armazenamento de dados existem há 25 anos, desempenhando um papel crucial na recolha de dados e na transformação desses dados em valor, permitindo que os utilizadores tomem decisões com base em fatos. É amplamente aceite, que um data warehouse é um componente crítico para uma empresa orientada a dados e se torna parte da estratégia de sistemas de informação da organização, com um impacto significativo nos negócios. No entanto, após 25 anos, a construção de um Data Warehouse ainda é uma tarefa penosa, demora muito tempo, é cara e difícil de mudar após a sua conclusão. A automação de Data Warehouse aparece com a promessa de endereçar as limitações das abordagens tradicionais, transformando o desenvolvimento da data warehouse de um esforço prolongado em um esforço ágil, com ganhos de eficiência e eficácia. Será, a automação de Data Warehouse uma doçura ou travessura? Foi desenvolvido um estudo de caso de uma arquitetura de data warehousing usando uma ferramenta de automação, designada WhereScape. Foi também conduzido um questionário a organizações que utilizam ferramentas de automação de data warehouse, para entender sua motivação na adoção deste tipo de ferramentas. Com base nos resultados da pesquisa e no estudo de caso, a automação no processo de construção de data warehouses, é necessária para uma maior agilidade destes sistemas e uma solução a considerar na modernização destas arquiteturas, pois permitem obter resultados mais rapidamente, mantendo os custos controlados e reduzindo o risco. A automação de data warehouse pode bem vir a ser uma “doçura”

    A multidimensional and multiversion structure for OLAP applications

    Full text link
    When changes occur on data organization, conventional multidimensional structures are not adapted because dimensions are supposed to be static. In many cases, especially when time covered by the data warehouse is large, dimensions of the hypercube must be redesigned in order to integrate evolutions. We propose an approach allowing to track history but also to compare data, mapped into static structures. We define a conceptual model building a Mutiversion Fact Table from the Temporal Multidimensional Schema and we introduce the notion of temporal modes of representation corresponding to different ways to analyze data and their evolution

    Temporal and Evolving Data Warehouse Design

    Get PDF

    Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

    Get PDF
    Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

    The Noetic Prism

    Get PDF
    Definitions of ‘knowledge’ and its relationships with ‘data’ and ‘information’ are varied, inconsistent and often contradictory. In particular the traditional hierarchy of data-information-knowledge and its various revisions do not stand up to close scrutiny. We suggest that the problem lies in a flawed analysis that sees data, information and knowledge as separable concepts that are transformed into one another through processing. We propose instead that we can describe collectively all of the materials of computation as ‘noetica’, and that the terms data, information and knowledge can be reconceptualised as late-binding, purpose-determined aspects of the same body of material. Changes in complexity of noetica occur due to value-adding through the imposition of three different principles: increase in aggregation (granularity), increase in set relatedness (shape), and increase in contextualisation through the formation of networks (scope). We present a new model in which granularity, shape and scope are seen as the three vertices of a triangular prism, and show that all value-adding through computation can be seen as movement within the prism space. We show how the conceptual framework of the noetic prism provides a new and comprehensive analysis of the foundations of computing and information systems, and how it can provide a fresh analysis of many of the common problems in the management of intellectual resources

    Relational model of temporal data based on 6th normal form

    Get PDF
    Ovaj rad povezuje dva različita područja istraživanja, tj. Temporalne podatke i Relacijsko modeliranje. Temporalni podaci su podaci koji predstavljaju stanje u vremenu, a temporalna baza podataka je baza podataka s ugrađenom podrškom za baratanje s podacima koji uključuju vrijeme. Većina temporalnih sustava pruža dovoljno temporalnih karakteristika, ali su relacijski modeli nepravilno normalizirani, a pristupi modeliranju nedostaju ili nisu uvjerljivi. Ovim se prijedlogom daju prednosti modeliranja temporalne baze podataka, prvenstveno korištene u analitici i izvještavanju, gdje tipična pretraživanja uključuju mali podniz atributa i veliku količinu zapisa. U radu se definira posebni logički model koji podržava temporalne podatke i konzistenciju, zasnovan na vertikalnoj dekompoziciji i šestoj normalnoj formi (6NF). Primjena 6NF omogućuje neovisnost u promjeni atributnih vrijednosti i tako sprečava redundanciju i anomalije. Naš je model uspoređen s drugim temporalnim modelima i demonstrirano je super brzo pretraživanje postignuto eliminacijom spajanja baze podataka (database join elimination). Svrha je rada pomoći stručnjacima koji se bave bazama podataka u primjeni temporalnog modeliranja.This paper brings together two different research areas, i.e. Temporal Data and Relational Modelling. Temporal data is data that represents a state in time while temporal database is a database with built-in support for handling data involving time. Most of temporal systems provide sufficient temporal features, but the relational models are improperly normalized, and modelling approaches are missing or unconvincing. This proposal offers advantages for a temporal database modelling, primarily used in analytics and reporting, where typical queries involve a small subset of attributes and a big amount of records. The paper defines a distinctive logical model, which supports temporal data and consistency, based on vertical decomposition and sixth normal form (6NF). The use of 6NF allows attribute values to change independently of each other, thus preventing redundancy and anomalies. Our proposal is evaluated against other temporal models and super-fast querying is demonstrated, achieved by database join elimination. The paper is intended to help database professionals in practice of temporal modelling

    Towards development of fuzzy spatial datacubes : fundamental concepts with example for multidimensional coastal erosion risk assessment and representation

    Get PDF
    Les systèmes actuels de base de données géodécisionnels (GeoBI) ne tiennent généralement pas compte de l'incertitude liée à l'imprécision et le flou des objets; ils supposent que les objets ont une sémantique, une géométrie et une temporalité bien définies et précises. Un exemple de cela est la représentation des zones à risque par des polygones avec des limites bien définies. Ces polygones sont créés en utilisant des agrégations d'un ensemble d'unités spatiales définies sur soit des intérêts des organismes responsables ou les divisions de recensement national. Malgré la variation spatio-temporelle des multiples critères impliqués dans l’analyse du risque, chaque polygone a une valeur unique de risque attribué de façon homogène sur l'étendue du territoire. En réalité, la valeur du risque change progressivement d'un polygone à l'autre. Le passage d'une zone à l'autre n'est donc pas bien représenté avec les modèles d’objets bien définis (crisp). Cette thèse propose des concepts fondamentaux pour le développement d'une approche combinant le paradigme GeoBI et le concept flou de considérer la présence de l’incertitude spatiale dans la représentation des zones à risque. En fin de compte, nous supposons cela devrait améliorer l’analyse du risque. Pour ce faire, un cadre conceptuel est développé pour créer un model conceptuel d’une base de donnée multidimensionnelle avec une application pour l’analyse du risque d’érosion côtier. Ensuite, une approche de la représentation des risques fondée sur la logique floue est développée pour traiter l'incertitude spatiale inhérente liée à l'imprécision et le flou des objets. Pour cela, les fonctions d'appartenance floues sont définies en basant sur l’indice de vulnérabilité qui est un composant important du risque. Au lieu de déterminer les limites bien définies entre les zones à risque, l'approche proposée permet une transition en douceur d'une zone à une autre. Les valeurs d'appartenance de plusieurs indicateurs sont ensuite agrégées basées sur la formule des risques et les règles SI-ALORS de la logique floue pour représenter les zones à risque. Ensuite, les éléments clés d'un cube de données spatiales floues sont formalisés en combinant la théorie des ensembles flous et le paradigme de GeoBI. En plus, certains opérateurs d'agrégation spatiale floue sont présentés. En résumé, la principale contribution de cette thèse se réfère de la combinaison de la théorie des ensembles flous et le paradigme de GeoBI. Cela permet l’extraction de connaissances plus compréhensibles et appropriées avec le raisonnement humain à partir de données spatiales et non-spatiales. Pour ce faire, un cadre conceptuel a été proposé sur la base de paradigme GéoBI afin de développer un cube de données spatiale floue dans le system de Spatial Online Analytical Processing (SOLAP) pour évaluer le risque de l'érosion côtière. Cela nécessite d'abord d'élaborer un cadre pour concevoir le modèle conceptuel basé sur les paramètres de risque, d'autre part, de mettre en œuvre l’objet spatial flou dans une base de données spatiales multidimensionnelle, puis l'agrégation des objets spatiaux flous pour envisager à la représentation multi-échelle des zones à risque. Pour valider l'approche proposée, elle est appliquée à la région Perce (Est du Québec, Canada) comme une étude de cas.Current Geospatial Business Intelligence (GeoBI) systems typically do not take into account the uncertainty related to vagueness and fuzziness of objects; they assume that the objects have well-defined and exact semantics, geometry, and temporality. Representation of fuzzy zones by polygons with well-defined boundaries is an example of such approximation. This thesis uses an application in Coastal Erosion Risk Analysis (CERA) to illustrate the problems. CERA polygons are created using aggregations of a set of spatial units defined by either the stakeholders’ interests or national census divisions. Despite spatiotemporal variation of the multiple criteria involved in estimating the extent of coastal erosion risk, each polygon typically has a unique value of risk attributed homogeneously across its spatial extent. In reality, risk value changes gradually within polygons and when going from one polygon to another. Therefore, the transition from one zone to another is not properly represented with crisp object models. The main objective of the present thesis is to develop a new approach combining GeoBI paradigm and fuzzy concept to consider the presence of the spatial uncertainty in the representation of risk zones. Ultimately, we assume this should improve coastal erosion risk assessment. To do so, a comprehensive GeoBI-based conceptual framework is developed with an application for Coastal Erosion Risk Assessment (CERA). Then, a fuzzy-based risk representation approach is developed to handle the inherent spatial uncertainty related to vagueness and fuzziness of objects. Fuzzy membership functions are defined by an expert-based vulnerability index. Instead of determining well-defined boundaries between risk zones, the proposed approach permits a smooth transition from one zone to another. The membership values of multiple indicators (e.g. slop and elevation of region under study, infrastructures, houses, hydrology network and so on) are then aggregated based on risk formula and Fuzzy IF-THEN rules to represent risk zones. Also, the key elements of a fuzzy spatial datacube are formally defined by combining fuzzy set theory and GeoBI paradigm. In this regard, some operators of fuzzy spatial aggregation are also formally defined. The main contribution of this study is combining fuzzy set theory and GeoBI. This makes spatial knowledge discovery more understandable with human reasoning and perception. Hence, an analytical conceptual framework was proposed based on GeoBI paradigm to develop a fuzzy spatial datacube within Spatial Online Analytical Processing (SOLAP) to assess coastal erosion risk. This necessitates developing a framework to design a conceptual model based on risk parameters, implementing fuzzy spatial objects in a spatial multi-dimensional database, and aggregating fuzzy spatial objects to deal with multi-scale representation of risk zones. To validate the proposed approach, it is applied to Perce region (Eastern Quebec, Canada) as a case study

    Testing Data Vault-Based Data Warehouse

    Get PDF
    Data warehouse (DW) projects are undertakings that require integration of disparate sources of data, a well-defined mapping of the source data to the reconciled data, and effective Extract, Transform, and Load (ETL) processes. Owing to the complexity of data warehouse projects, great emphasis must be placed on an agile-based approach with properly developed and executed test plans throughout the various stages of designing, developing, and implementing the data warehouse to mitigate against budget overruns, missed deadlines, low customer satisfaction, and outright project failures. Yet, there are often attempts to test the data warehouse exactly like traditional back-end databases and legacy applications, or to downplay the role of quality assurance (QA) and testing, which only serve to fuel the frustration and mistrust of data warehouse and business intelligence (BI) systems. In spite of this, there are a number of steps that can be taken to ensure DW/BI solutions are successful, highly trusted, and stable. In particular, adopting a Data Vault (DV)-based Enterprise Data Warehouse (EDW) can simplify and enhance various aspects of testing, and curtail delays common in non-DV based DW projects. A major area of focus in this research is raw DV loads from source systems, keeping transformations to a minimum in the ETL process which loads the DV from the source. Certain load errors, classified as permissible errors and enforced by business rules, are kept in the Data Vault until correct values are supplied. Major transformation activities are pushed further downstream to the next ETL process which loads and refreshes the Data Mart (DM) from the Data Vault
    corecore