2,040 research outputs found

    Exchange-Repairs: Managing Inconsistency in Data Exchange

    Full text link
    In a data exchange setting with target constraints, it is often the case that a given source instance has no solutions. In such cases, the semantics of target queries trivialize. The aim of this paper is to introduce and explore a new framework that gives meaningful semantics in such cases by using the notion of exchange-repairs. Informally, an exchange-repair of a source instance is another source instance that differs minimally from the first, but has a solution. Exchange-repairs give rise to a natural notion of exchange-repair certain answers (XR-certain answers) for target queries. We show that for schema mappings specified by source-to-target GAV dependencies and target equality-generating dependencies (egds), the XR-certain answers of a target conjunctive query can be rewritten as the consistent answers (in the sense of standard database repairs) of a union of conjunctive queries over the source schema with respect to a set of egds over the source schema, making it possible to use a consistent query-answering system to compute XR-certain answers in data exchange. We then examine the general case of schema mappings specified by source-to-target GLAV constraints, a weakly acyclic set of target tgds and a set of target egds. The main result asserts that, for such settings, the XR-certain answers of conjunctive queries can be rewritten as the certain answers of a union of conjunctive queries with respect to the stable models of a disjunctive logic program over a suitable expansion of the source schema.Comment: 29 pages, 13 figures, submitted to the Journal on Data Semantic

    Discovery Agent : An Interactive Approach for the Discovery of Inclusion Dependencies

    Get PDF
    The information integration problem is a hard yet important problem in the field of databases. The goal of information integration is to provide unified views on diverse data among several resources. This subject has been studied for a long time. The integration can be performed using several ways. Schema integration using inclusion dependency constraints is one of them. The problem of discovering inclusion dependencies among input relations is NP-complete in terms of the number of attributes. Two significant algorithms address this problem: FIND2 by Andreas Koeller and Zigzag by Fabien De Marchi. Both algorithms discover inclusion dependencies among input relations on small scale databases having relatively few attributes. Because of the data discrepancy, they do not scale well with higher numbers of attributes. We propose an approach of incorporating human intelligence into the algorithmic discovery of inclusion dependencies. To use human intelligence, we design an agent called the discovery agent, to provide a communication bridge between an algorithm and a user. The discovery agent demonstrates the progress of the discovery process and provides sufficient user controls to govern the discovery process into the right direction. In this thesis, we present a prototype of the discovery agent based upon the FIND2 algorithm, which utilizes most of the phase-wise behavior of the algorithm and demonstrate how human observer and algorithm work together to achieve higher performance and better output accuracy. The goal of the discovery agent is to make the discovery process truly interactive between system and user as well as to produce the desired and accurate result. The discovery agent can deliver an applicable and feasible approximation of an NP-complete problem with the help of suitable algorithm and appropriate human expertise

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Physical Data Independence, Constraints and Optimization with Universal Plans

    Get PDF
    We present an optimization method and al gorithm designed for three objectives: physi cal data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and backchase with constraints (dependen cies). By using dictionaries (finite functions) in physical schemas we can capture with con straints useful access structures such as indexes, materialized views, source capabilities, access support relations, gmaps, etc. The search space for query plans is defined and enumerated in a novel manner: the chase phase rewrites the original query into a universal plan that integrates all the access structures and alternative pathways that are allowed by appli cable constraints. Then, the backchase phase produces optimal plans by eliminating various combinations of redundancies, again according to constraints. This method is applicable (sound) to a large class of queries, physical access structures, and semantic constraints. We prove that it is in fact complete for path-conjunctive queries and views with complex objects, classes and dictio naries, going beyond previous theoretical work on processing queries using materialized views
    corecore