4,065 research outputs found

    Incremental View Maintenance For Collection Programming

    Get PDF
    In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix

    The XFM view adaptation mechanism: An essential component for XML data warehouses

    Get PDF
    In the past few years, with many organisations providing web services for business and communication purposes, large volumes of XML transactions take place on a daily basis. In many cases, organisations maintain these transactions in their native XML format due to its flexibility for xchanging data between heterogeneous systems. This XML data provides an important resource for decision support systems. As a consequence, XML technology has slowly been included within decision support systems of data warehouse systems. The problem encountered is that existing native XML database systems suffer from poor performance in terms of managing data volume and response time for complex analytical queries. Although materialised XML views can be used to improve the performance for XML data warehouses, update problems then become the bottleneck of using materialised views. Specifically, synchronising materialised views in the face of changing view definitions, remains a significant issue. In this dissertation, we provide a method for XML-based data warehouses to manage updates caused by the change of view definitions (view redefinitions), which is referred to as the view adaptation problem. In our approach, views are defined using XPath and then modelled using a set of novel algebraic operators and fragments. XPath views are integrated into a single view graph called the XML Fragment Materialisation (XFM) View Graph, where common parts between different views are shared and appear only once in the graph. Fragments within the view graph can be selected for materialisation to facilitate the view adaptation process. While changes are applied, our view adaptation algorithms can quickly determine what part of the XFM view graph is affected. The adaptation algorithms then perform a structural adaptation to update the view graph, followed by data adaptation to update materialised fragments

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    State-of-the-art on evolution and reactivity

    Get PDF
    This report starts by, in Chapter 1, outlining aspects of querying and updating resources on the Web and on the Semantic Web, including the development of query and update languages to be carried out within the Rewerse project. From this outline, it becomes clear that several existing research areas and topics are of interest for this work in Rewerse. In the remainder of this report we further present state of the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs; in Chapter 4 event-condition-action rules, both in the context of active database systems and in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks

    Incremental schema integration for data wrangling via knowledge graphs

    Get PDF
    Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.This work was partly supported by the DOGO4ML project, funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00, and D3M project, funded by the Spanish Agencia Estatal de Investigación (AEI) under project PDC2021-121195-I00. Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico). Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovación, as well as the European Union – NextGenerationEU, under project FJC2020-045809-I.Peer ReviewedPostprint (published version

    Provenance in Collaborative Data Sharing

    Get PDF
    This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases. To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques

    Generating collaborative systems for digital libraries: A model-driven approach

    Get PDF
    This is an open access article shared under a Creative Commons Attribution 3.0 Licence (http://creativecommons.org/licenses/by/3.0/). Copyright @ 2010 The Authors.The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework

    ODIN: A dataspace management system

    Get PDF
    ODIN is a system that supports the incremental pay-as-you-go integration of data sources into dataspaces and provides user-friendly querying mechanisms on top of them. We describe its main characteristics and underlying assumptions, including the user interactions required. Odin’s novelty lies in a largely automated bottom-up approach (i.e., driven by the sources at hand) that includes the user in the loop for disambiguation purposes. The on-site demonstration will feature an ongoing project with the World Health Organization (WHO). Online demo and videos: www.essi.upc.edu/dtim/odin/Peer ReviewedPostprint (published version
    corecore