136 research outputs found

    Computer-Aided Warehouse Engineering (CAWE): Leveraging MDA and ADM for the Development of Data Warehouses

    Get PDF
    During the last decade, data warehousing has reached a high maturity and is a well-accepted technology in decision support systems. Nevertheless, development and maintenance are still tedious tasks since the systems grow over time and complex architectures have been established. The paper at hand adopts the concepts of Model Driven Architecture (MDA) and Architecture Driven Modernization (ADM) taken from the software engineering discipline to the data warehousing discipline. We show the works already available, outline further research directions and give hints for implementation of Computer-Aided Warehouse Engineering systems

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    XML Encoding and Web Services for Spatial OLAP Data Cube Exchange: an SOA Approach

    Get PDF
    XML and Web Services technologies have revolutionized the way data are exchanged on the Internet. Meanwhile, Spatial OLAP (SOLAP) tools have emerged to bridge the gap between the Business Intelligence and Geographic Information Systems domains. While Web Services specifications such as XML for Analysis enable the use of OLAP tools in Service Oriented Architecture (SOA) environments, no solution addresses the exchange of complete SOLAP data cubes (comprising both spatial and descriptive data and metadata) in an interoperable fashion. This paper proposes a new XML grammar for the exchange of SOLAP data cubes, containing both spatial and descriptive data and metadata. It enables the delivery of the cube schema, dimension members (including the geometry of spatial members) and fact data. The use of this XML format is then demonstrated in the context of a Web Service. Such services can be deployed in various situations, not limited to traditional client-server platforms but also ubiquitous mobile computing environments

    Conceptual design of an XML FACT repository for dispersed XML document warehouses and XML marts

    Get PDF
    Since the introduction of eXtensible Markup Language (XML), XML repositories have gained a foothold in many global (and government) organizations, where, e-Commerce and e-business models have maturated in handling daily transactional data among heterogeneous information systems in multi-data formats. Due to this, the amount of data available for enterprise decision-making process is increasing exponentially and are being stored and/or communicated in XML. This presents an interesting challenge to investigate models, frameworks and techniques for organizing and analyzing such voluminous, yet distributed XML documents for business intelligence in the form of XML warehouse repositories and XML marts. In this paper, we address such an issue, where we propose a view-driven approach for modelling and designing of a Global XML FACT (GxFACT) repository under the MDA initiatives. Here we propose the GxFACT using logically grouped, geographically dispersed, XML document warehouses and Document Marts in a global enterprise setting. To deal with organizations? evolving decision-making needs, we also provide three design strategies for building and managing of such GxFACT in the context of modelling of further hierarchical dimensions and/or global document warehouses

    A Goal and Ontology Based Approach for Generating ETL Process Specifications

    Get PDF
    Data warehouse (DW) systems development involves several tasks such as defining requirements, designing DW schemas, and specifying data transformation operations. Indeed, the success of DW systems is very much dependent on the proper design of the extracting, transforming, and loading (ETL) processes. However, the common design-related problems in the ETL processes such as defining user requirements and data transformation specifications are far from being resolved. These problems are due to data heterogeneity in data sources, ambiguity of user requirements, and the complexity of data transformation activities. Current approaches have limitations on the reconciliation of DW requirement semantics towards designing the ETL processes. As a result, this has prolonged the process of the ETL processes specifications generation. The semantic framework of DW systems established from this study is used to develop the requirement analysis method for designing the ETL processes (RAMEPs) from the different perspectives of organization, decision-maker, and developer by using goal and ontology approaches. The correctness of RAMEPs approach was validated by using modified and newly developed compliant tools. The RAMEPs was evaluated in three real case studies, i.e., Student Affairs System, Gas Utility System, and Graduate Entrepreneur System. These case studies were used to illustrate how the RAMEPs approach can be implemented for designing and generating the ETL processes specifications. Moreover, the RAMEPs approach was reviewed by the DW experts for assessing the strengths and weaknesses of this method, and the new approach is accepted. The RAMEPs method proves that the ETL processes specifications can be derived from the early phases of DW systems development by using the goal-ontology approach

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    A Business Intelligence Framework for Analyzing Educational Data

    Get PDF
    Currently, universities are being forced to change the paradigms of education, where knowledge is mainly based on the experience of the teacher. This change includes the development of quality education focused on students’ learning. These factors have forced universities to look for a solution that allows them to extract data from different information systems and convert them into the knowledge necessary to make decisions that improve learning outcomes. The information systems administered by the universities store a large volume of data on the socioeconomic and academic variables of the students. In the university field, these data are generally not used to generate knowledge about their students, unlike in the business field, where the data are intensively analyzed in business intelligence to gain a competitive advantage. These success stories in the business field can be replicated by universities through an analysis of educational data. This document presents a method that combines models and techniques of data mining within an architecture of business intelligence to make decisions about variables that can influence the development of learning. In order to test the proposed method, a case study is presented, in which students are identified and classified according to the data they generate in the different information systems of a university

    Beiträge zu Business Intelligence und IT-Compliance

    Get PDF
    [no abstract
    • …
    corecore