4,858 research outputs found

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    An MDA approach for developing Secure OLAP applications: metamodels and transformations

    Get PDF
    Decision makers query enterprise information stored in Data Warehouses (DW) by using tools (such as On-Line Analytical Processing (OLAP) tools) which employ specific views or cubes from the corporate DW or Data Marts, based on multidimensional modelling. Since the information managed is critical, security constraints have to be correctly established in order to avoid unauthorized access. In previous work we defined a Model-Driven based approach for developing a secure DW repository by following a relational approach. Nevertheless, it is also important to define security constraints in the metadata layer that connects the DW repository with the OLAP tools; that is, over the same multidimensional structures that end users manage. This paper incorporates a proposal for developing secure OLAP applications within our previous approach: it improves a UML profile for conceptual modelling; it defines a logical metamodel for OLAP applications; and it defines and implements transformations from conceptual to logical models, as well as from logical models to secure implementation in a specific OLAP tool (SQL Server Analysis Services).This research is part of the following projects: SIGMA-CC (TIN2012-36904), GEODAS-BC (TIN2012-37493-C01) and GEODAS-BI (TIN2012-37493-C03) funded by the Ministerio de EconomĂ­a y Competitividad and Fondo Europeo de Desarrollo Regional FEDER. SERENIDAD (PEII11-037-7035) and MOTERO (PEII11- 0399-9449) funded by the ConsejerĂ­a de EducaciĂłn, Ciencia y Cultura de la Junta de Comunidades de Castilla La Mancha, and Fondo Europeo de Desarrollo Regional FEDER

    Integration of Data Mining and Data Warehousing: a practical methodology

    Get PDF
    The ever growing repository of data in all fields poses new challenges to the modern analytical systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has to be modeled in form of a data warehouse schema. Schema generation process is complex manual task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful knowledge from large datasets

    Framework for Interoperable and Distributed Extraction-Transformation-Loading (ETL) Based on Service Oriented Architecture

    Get PDF
    Extraction. Transformation and Loading (ETL) are the major functionalities in data warehouse (DW) solutions. Lack of component distribution and interoperability is a gap that leads to many problems in the ETL domain, which is due to tightly-coupled components in the current ETL framework. This research discusses how to distribute the Extraction, Transformation and Loading components so as to achieve distribution and interoperability of these ETL components. In addition, it shows how the ETL framework can be extended. To achieve that, Service Oriented Architecture (SOA) is adopted to address the mentioned missing features of distribution and interoperability by restructuring the current ETL framework. This research contributes towards the field of ETL by adding the distribution and inter- operability concepts to the ETL framework. This Ieads to contributions towards the area of data warehousing and business intelligence, because ETL is a core concept in this area. The Design Science Approach (DSA) and Scrum methodologies were adopted for achieving the research goals. The integration of DSA and Scrum provides the suitable methods for achieving the research objectives. The new ETL framework is realized by developing and testing a prototype that is based on the new ETL framework. This prototype is successfully evaluated using three case studies that are conducted using the data and tools of three different organizations. These organizations use data warehouse solutions for the purpose of generating statistical reports that help their top management to take decisions. Results of the case studies show that distribution and interoperability can be achieved by using the new ETL framework

    Designing secure data warehouses by using MDA and QVT

    Get PDF
    The Data Warehouse (DW) design is based on multidimensional (MD) modeling which structures information into facts and dimensions. Due to the confidentiality of the data that it stores, it is crucial to specify security and audit measures from the early stages of design and to enforce them throughout the lifecycle. Moreover, the standard framework for software development, Model Driven Architecture (MDA), allows us to define transformations between models by proposing Query/View/Transformations (QVT). This proposal permits the definition of formal, elegant and unequivocal transformations between Platform Independent Models (PIM) and Platform Specific Models (PSM). This paper introduces a new framework for the design of secure DWs based on MDA and QVT, which covers all the design phases (conceptual, logical and physical) and specifies security measures in all of them. We first define two metamodels with which to represent security and audit measures at the conceptual and logical levels. We then go on to define a transformation between these models through which to obtain the traceability of the security rules from the early stages of development to the final implementation. Finally, in order to show the benefits of our proposal, it is applied to a case study.This work has been partially supported by the METASIGN project (TIN2004-00779) from the Spanish Ministry of Education and Science, of the Regional Government of Valencia, and by the QUASIMODO and MISTICO projects of the Regional Science and Technology Ministry of Castilla-La Mancha (Spain)

    Graph BI & analytics: current state and future challenges

    Get PDF
    In an increasingly competitive market, making well-informed decisions requires the analysis of a wide range of heterogeneous, large and complex data. This paper focuses on the emerging field of graph warehousing. Graphs are widespread structures that yield a great expressive power. They are used for modeling highly complex and interconnected domains, and efficiently solving emerging big data application. This paper presents the current status and open challenges of graph BI and analytics, and motivates the need for new warehousing frameworks aware of the topological nature of graphs. We survey the topics of graph modeling, management, processing and analysis in graph warehouses. Then we conclude by discussing future research directions and positioning them within a unified architecture of a graph BI and analytics framework.Peer ReviewedPostprint (author's final draft

    An MDA approach for developing secure OLAP applications: Metamodels and transformations

    Get PDF
    Decision makers query enterprise information stored in DataWarehouses (DW) by using tools (such as On-Line Analytical Processing (OLAP) tools) which employ specific views or cubes from the corporate DW or Data Marts, based on multidimensional modelling. Since the information managed is critical, security constraints have to be correctly established in order to avoid unauthorized access. In previous work we defined a Model-Driven based approach for developing a secure DW repository by following a relational approach. Nevertheless, it is also important to define security constraints in the metadata layer that connects the DW repository with the OLAP tools; that is, over the same multidimensional structures that end users manage. This paper incorporates a proposal for developing secure OLAP applications within our previous approach: it improves a UML profile for conceptual modelling; it defines a logical metamodel for OLAP applications; and it defines and implements transformations from conceptual to logical models, as well as from logical models to secure implementation in a specific OLAP tool (SQL Server Analysis Services). © 2015 ComSIS Consortium. All rights reserved.This research is part of the following projects: SIGMA-CC (TIN2012-36904), GEODAS-BC (TIN2012-37493-C01) and GEODAS-BI (TIN2012-37493-C03) funded by the Ministerio de Economía y Competitividad and Fondo Europeo de Desarrollo Regional FEDER

    Towards Conceptual Multidimensional Design in Decision Support Systems

    Get PDF
    International audienceMultidimensional databases support efficiently on-line analytical processing (OLAP). In this paper, we depict a model dedicated to multidimensional databases. The approach we present designs decisional information through a constellation of facts and dimensions. Each dimension is possibly shared between several facts and it is organised according to multiple hierarchies. In addition, we define a comprehensive query algebra regrouping the more popular multidimensional operations in current commercial systems and research approaches. We introduce new operators dedicated to a constellation. Finally, we describe a prototype that allows managers to query constellations of facts, dimensions and multiple hierarchies
    • …
    corecore