22 research outputs found

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    Xcube: XML for data warehouses

    Get PDF
    ABSTRACT Data warehouse systems are nowadays a well known and widely spread approach for supporting management decisions. In several companies or even across companies the idea of integrating several data warehouses into a virtual or federated data warehouse is of growing interest. But the technical and semantic problems are very demanding. An essential part for solving this problem is a standardized, vendor independent format for describing multidimensional data. This paper introduces XCube, a family of XML based document templates to exchange data warehouse data, i.e. data cubes, over any kind of network. XCube is organized in a modular fashion, so the multidimensional schema, the descriptions of the single dimensions and the fact data itself can be transmitted in separate steps. In addition to the describing formats XCube also offers two kinds of dynamic document types that can be used to explore the (multidimensional) content of another warehouse in a vendor independent way. They are primarily meant to reduce the amount of data transferred over the networ

    CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries

    Get PDF
    Being able to efficiently answer arbitrary OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes has been a continued, major concern in data warehousing. In this paper, we introduce a new data structure, called Statistics Tree (ST), together with an efficient algorithm called CubiST, for evaluating ad-hoc OLAP queries on top of a relational data warehouse. We are focusing on a class of queries called cube queries, which generalize the data cube operator. CubiST represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the familiar view lattice to compute and materialize new views from existing views in some heuristic fashion. CubiST is the first OLAP algorithm that needs only one scan over the detailed data set and can efficiently answer any cube query without additional I/O when the ST fits into memory. We have implemented CubiST and our experiments have demonstrated significant improvements in performance and scalability over existing ROLAP/MOLAP approaches

    Entwicklung eines Data Warehouse für das Produktionscontrolling:Konzepte und Erfahrungen

    Full text link
    Aufgabe eines Data Warehouse ist die schnelle und flexible Bereitstellung entscheidungsrelevanter Daten. Es stellt damit - je nach Interpretation - eine Weiterentwicklung der Entscheidungsunterstützungssysteme oder eine auf Analyseaufgaben ausgerichtete Datenbank dar. Um seine Aufgabe erfüllen zu können, muß ein Data Warehouse heterogene Datenquellen zu einer stabilen, konsistenten Datenbasis zusammenführen, Detaildaten für die analytischen Auswertungen (vor-)verdichten sowie auch zeitraumbezogene Längsschnittanalysen unterstützen. Die Entwicklung eines Warehouse unterscheidet sich deshalb in vielen Punkten von der Entwicklung eines traditionellen, transaktionsorientierten Anwendungssystems. Der folgende Beitrag diskutiert, welche Aufgaben jeweils in den verschiedenen Phasen einer Data Wareouse-Entwicklung anfallen und wie diese Aufgaben durchgeführt werden können. Die vorgestellten Konzepte und Erfahrungen sind Ergebnisse eines Kooperationsprojekts zwischen dem Institut und einem großen Maschinenbauunternehmen.<br/

    Modeling Large Scale OLAP Scenarios

    Get PDF
    In the recent past, different multidimensional data models were introduced to model OLAP (‘Online Analytical Processing’) scenarios. Design problems arise, when the modeled OLAP scenarios become very large and the dimensionality increases, which greatly decreases the support for an efficient ad-hoc data analysis process. Therefore, we extend the classical multidimensional model by grouping functionally dependent attributes within single dimensions, yielding in real orthogonal dimensions, which are easy to create and to maintain on schema design level. During the multidimensional data analysis phase, this technique yields in nested data cubes reflecting an intuitive two-step navigation process: classification-oriented ‘drill-down’/ ‘roll-up’ and description-oriented‘split’/ ‘merge’ operators on data cubes. Thus, the proposed Nested Multidimensional Data Model provides great modeling flexibility during the schema design phase and application-oriented restrictiveness during the data analysis phase

    On-line analytical processing in distributed data warehouses

    Get PDF
    The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined

    Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications

    Get PDF
    Multidimensional (MD) modeling is the basis for data warehouses (DW), multidimensional databases (MDB) and on-line analytical processing (OLAP) applications. In this paper, we present how the unified modeling language (UML) can be successfully used to represent both structural and dynamic properties of these systems at the conceptual level. The structure of the system is specified by means of a UML class diagram that considers the main properties of MD modeling with minimal use of constraints and extensions of the UML. If the system to be modeled is too complex, thereby leading us to a considerable number of classes and relationships, we describe how to use the package grouping mechanism provided by the UML to simplify the final model. Furthermore, we provide a UML-compliant class notation (called cube class) to represent OLAP users’ initial requirements. We also describe how we can use the UML state and interaction diagrams to model the behavior of a data warehouse system. To facilitate the interchange of conceptual MD models, we provide a Document Type Definition (DTD) which allows us to represent the same MD modeling properties that can be considered by using our approach. From this DTD, we can directly generate valid eXtensible Markup Language (XML) documents that represent MD models at the conceptual level. We believe that our innovative approach provides a theoretical foundation for simplifying the conceptual design of MD systems and the examples included in this paper clearly illustrate the use of our approach

    A survey of logical models for OLAP databases

    Full text link

    A UML profile for multidimensional modeling in data warehouses

    Get PDF
    The multidimensional (MD) modeling, which is the foundation of data warehouses (DWs), MD databases, and On-Line Analytical Processing (OLAP) applications, is based on several properties different from those in traditional database modeling. In the past few years, there have been some proposals, providing their own formal and graphical notations, for representing the main MD properties at the conceptual level. However, unfortunately none of them has been accepted as a standard for conceptual MD modeling. In this paper, we present an extension of the Unified Modeling Language (UML) using a UML profile. This profile is defined by a set of stereotypes, constraints and tagged values to elegantly represent main MD properties at the conceptual level. We make use of the Object Constraint Language (OCL) to specify the constraints attached to the defined stereotypes, thereby avoiding an arbitrary use of these stereotypes. We have based our proposal in UML for two main reasons: (i) UML is a well known standard modeling language known by most database designers, thereby designers can avoid learning a new notation, and (ii) UML can be easily extended so that it can be tailored for a specific domain with concrete peculiarities such as the multidimensional modeling for data warehouses. Moreover, our proposal is Model Driven Architecture (MDA) compliant and we use the Query View Transformation (QVT) approach for an automatic generation of the implementation in a target platform. Throughout the paper, we will describe how to easily accomplish the MD modeling of DWs at the conceptual level. Finally, we show how to use our extension in Rational Rose for MD modeling.This work has been partially supported by the METASIGN project (TIN2004-00779) from the Spanish Ministry of Education and Science, by the DADASMECA project (GV05/220) from the Regional Government of Valencia, and by the MESSENGER (PCC-03-003-1) and DADS (PBC-05-012-2) projects from the Regional Science and Technology Ministry of Castilla-La Mancha (Spain)
    corecore