Search CORE

8 research outputs found

Integrating data warehouses with web data : a survey

Author: Aramburu Cabo María José
Berlanga Llavori Rafael
Pedersen Torben Bach
Pérez Martínez Juan Manuel
Publication venue: IEEE Computer Society
Publication date: 01/01/2008
Field of study

This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositori Institucional de la Universitat Jaume I

VBN

PROCESSAMENTO ANALÍTICO EM DADOS XML

Author: Silqueira Hickson Cruz Mateus
Silva Paulo Caetano da
Publication venue: 'Revista de Sistemas e Computacao - RSC'
Publication date: 15/08/2012
Field of study

O uso de ferramentas de processamento analítico de dados (OLAP) para realização de análises estratégicas de uma organização possibilita que usuários responsáveis pela tomada de decisões possam identificar tendências e padrões, de forma a conduzir melhor o negócio da empresa em que atuam. Entretanto, o desenvolvimento de sistemas de processamento analítico em dados XML nos meios acadêmico e comercial não possui todas as funcionalidades das ferramentas OLAP para dados tradicionais e também não contempla documentos XML interdependentes. Portanto, a necessidade de desenvolver sistemas OLAP para auxiliar nas análises estratégicas dos dados de uma organização, representados no formato XML e interligados por um conjunto de referências, constitui a principal motivação para o desenvolvimento deste trabalho. Atualmente, pesquisas vêm sendo desenvolvidas no contexto acadêmico com o objetivo de realizar processamento analítico em dados representados em XML. No entanto, em razão destas tecnologias terem sido originalmente concebidas para propósitos distintos, esta não é uma tarefa trivial. Para ajudar no desenvolvimento desses sistemas OLAP, neste trabalho são discutidos os desafios que devem ser resolvidas para a realização de um processamento analítico eficiente sobre dados XML e avaliados alguns trabalhos acadêmicos que se propõe a realizar esta tarefa

Universidade Salvador: Portal de Periódicos UNIFACS

IDEAS-1997-2021-Final-Programs

Author: Desai Bipin C.
Publication venue
Publication date: 31/08/2021
Field of study

This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

Concordia University Research Repository

The XFM view adaptation mechanism: An essential component for XML data warehouses

Author: Liu Jun
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2011
Field of study

In the past few years, with many organisations providing web services for business and communication purposes, large volumes of XML transactions take place on a daily basis. In many cases, organisations maintain these transactions in their native XML format due to its flexibility for xchanging data between heterogeneous systems. This XML data provides an important resource for decision support systems. As a consequence, XML technology has slowly been included within decision support systems of data warehouse systems. The problem encountered is that existing native XML database systems suffer from poor performance in terms of managing data volume and response time for complex analytical queries. Although materialised XML views can be used to improve the performance for XML data warehouses, update problems then become the bottleneck of using materialised views. Specifically, synchronising materialised views in the face of changing view definitions, remains a significant issue. In this dissertation, we provide a method for XML-based data warehouses to manage updates caused by the change of view definitions (view redefinitions), which is referred to as the view adaptation problem. In our approach, views are defined using XPath and then modelled using a set of novel algebraic operators and fragments. XPath views are integrated into a single view graph called the XML Fragment Materialisation (XFM) View Graph, where common parts between different views are shared and appear only once in the graph. Fragments within the view graph can be selected for materialisation to facilitate the view adaptation process. While changes are applied, our view adaptation algorithms can quickly determine what part of the XFM view graph is affected. The adaptation algorithms then perform a structural adaptation to update the view graph, followed by data adaptation to update materialised fragments

Irish Universities

DCU Online Research Access Service

Just-in-time Analytics Over Heterogeneous Data and Hardware

Author: Karpathiotakis Manolis
Publication venue: Lausanne, EPFL
Publication date: 28/11/2017
Field of study

Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

Infoscience - École polytechnique fédérale de Lausanne

Achieving Adaptivity For OLAP-XML Federations

Author: Pedersen D.
Pedersen Torben Bach
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Motivated by the need for more flexible OLAP systems, this paper presents work on logical integration of external data in OLAP databases, carried out in cooperation between the Danish OLAP client vendor TARGIT and Aalborg University. Flexibility is ensured by supporting XML as the external data format, since almost all data sources can be efficiently wrapped in XML. Earlier work has resulted in an extension of the TARGIT system, allowing external XML data to be used as dimensions and measures in OLAP databases. This work has led to a number of new ideas for improving the system’s ability to adapt to changes in its surroundings. This paper describes the potential problems that may interrupt the operation of the integration system, in particular those caused by the often autonomous and unreliable nature of external XML data sources, and methods for handling these problems. Specifically, we describe techniques for handling changes in external XML data sources. We also describe techniques for improving the reliability of external XML sources, e.g., when these are found on the Internet, by dynamically trying to locate alternative sources during the evaluation of a query. Finally, we discuss solutions to a number of other possible problems, and show how the techniques can be integrated in the TARGIT architecture. Experiments performed with a prototype implementation of central functionality shows the viability of the proposed solutions

CiteSeerX

Crossref

VBN

Achieving adaptivity for OLAP-XML federations

Author: Dennis Pedersen
Torben Bach Pedersen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Motivated by the need for more flexible OLAP systems, this pa-per presents work on logical integration of external data in OLAP databases, carried out in cooperation between the Danish OLAP client vendor TARGIT and Aalborg University. Flexibility is en-sured by supporting XML as the external data format, since almost all data sources can be efficiently wrapped in XML. Earlier work has resulted in an extension of the TARGIT system, allowing ex-ternal XML data to be used as dimensions and measures in OLAP databases. This work has led to a number of new ideas for improv-ing the system’s ability to adapt to changes in its surroundings. This paper describes the potential problems that may interrupt the operation of the integration system, in particular those caused by the often autonomous and unreliable nature of external XML data sources, and methods for handling these problems. Specifi-cally, we describe techniques for handling changes in external XML data sources. We also describe techniques for improving the relia-bility of external XML sources, e.g., when these are found on the Internet, by dynamically trying to locate alternative sources during the evaluation of a query. Finally, we discuss solutions to a num-ber of other possible problems, and show how the techniques can be integrated in the TARGIT architecture. Experiments performed with a prototype implementation of central functionality shows the viability of the proposed solutions

CiteSeerX

Crossref

Adaptive monitoring and control framework in Application Service Management environment

Author: Sikora Tomasz Dawid
Publication venue: Birkbeck, University of London
Publication date
Field of study

The economics of data centres and cloud computing services have pushed hardware and software requirements to the limits, leaving only very small performance overhead before systems get into saturation. For Application Service Management–ASM, this carries the growing risk of impacting the execution times of various processes. In order to deliver a stable service at times of great demand for computational power, enterprise data centres and cloud providers must implement fast and robust control mechanisms that are capable of adapting to changing operating conditions while satisfying service–level agreements. In ASM practice, there are normally two methods for dealing with increased load, namely increasing computational power or releasing load. The first approach typically involves allocating additional machines, which must be available, waiting idle, to deal with high demand situations. The second approach is implemented by terminating incoming actions that are less important to new activity demand patterns, throttling, or rescheduling jobs. Although most modern cloud platforms, or operating systems, do not allow adaptive/automatic termination of processes, tasks or actions, it is administrators’ common practice to manually end, or stop, tasks or actions at any level of the system, such as at the level of a node, function, or process, or kill a long session that is executing on a database server. In this context, adaptive control of actions termination remains a significantly underutilised subject of Application Service Management and deserves further consideration. For example, this approach may be eminently suitable for systems with harsh execution time Service Level Agreements, such as real–time systems, or systems running under conditions of hard pressure on power supplies, systems running under variable priority, or constraints set up by the green computing paradigm. Along this line of work, the thesis investigates the potential of dimension relevance and metrics signals decomposition as methods that would enable more efficient action termination. These methods are integrated in adaptive control emulators and actuators powered by neural networks that are used to adjust the operation of the system to better conditions in environments with established goals seen from both system performance and economics perspectives. The behaviour of the proposed control framework is evaluated using complex load and service agreements scenarios of systems compatible with the requirements of on–premises, elastic compute cloud deployments, server–less computing, and micro–services architectures

Birkbeck Institutional Research Online