2,040 research outputs found
Exchange-Repairs: Managing Inconsistency in Data Exchange
In a data exchange setting with target constraints, it is often the case that
a given source instance has no solutions. In such cases, the semantics of
target queries trivialize. The aim of this paper is to introduce and explore a
new framework that gives meaningful semantics in such cases by using the notion
of exchange-repairs. Informally, an exchange-repair of a source instance is
another source instance that differs minimally from the first, but has a
solution. Exchange-repairs give rise to a natural notion of exchange-repair
certain answers (XR-certain answers) for target queries. We show that for
schema mappings specified by source-to-target GAV dependencies and target
equality-generating dependencies (egds), the XR-certain answers of a target
conjunctive query can be rewritten as the consistent answers (in the sense of
standard database repairs) of a union of conjunctive queries over the source
schema with respect to a set of egds over the source schema, making it possible
to use a consistent query-answering system to compute XR-certain answers in
data exchange. We then examine the general case of schema mappings specified by
source-to-target GLAV constraints, a weakly acyclic set of target tgds and a
set of target egds. The main result asserts that, for such settings, the
XR-certain answers of conjunctive queries can be rewritten as the certain
answers of a union of conjunctive queries with respect to the stable models of
a disjunctive logic program over a suitable expansion of the source schema.Comment: 29 pages, 13 figures, submitted to the Journal on Data Semantic
Discovery Agent : An Interactive Approach for the Discovery of Inclusion Dependencies
The information integration problem is a hard yet important problem in the field of databases. The goal of information integration is to provide unified views on diverse data among several resources. This subject has been studied for a long time. The integration can be performed using several ways. Schema integration using inclusion dependency constraints is one of them. The problem of discovering inclusion dependencies among input relations is NP-complete in terms of the number of attributes.
Two significant algorithms address this problem: FIND2 by Andreas Koeller and Zigzag by Fabien De Marchi. Both algorithms discover inclusion dependencies among input relations on small scale databases having relatively few attributes. Because of the data discrepancy, they do not scale well with higher numbers of attributes.
We propose an approach of incorporating human intelligence into the algorithmic discovery of inclusion dependencies. To use human intelligence, we design an agent called the discovery agent, to provide a communication bridge between an algorithm and a user. The discovery agent demonstrates the progress of the discovery process and provides sufficient user controls to govern the discovery process into the right direction. In this thesis, we present a prototype of the discovery agent based upon the FIND2 algorithm, which utilizes most of the phase-wise behavior of the algorithm and demonstrate how human observer and algorithm work together to achieve higher performance and better output accuracy.
The goal of the discovery agent is to make the discovery process truly interactive between system and user as well as to produce the desired and accurate result. The discovery agent can deliver an applicable and feasible approximation of an NP-complete problem with the help of suitable algorithm and appropriate human expertise
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Parallelization of dynamic programming recurrences in computational biology
The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms
Physical Data Independence, Constraints and Optimization with Universal Plans
We present an optimization method and al gorithm designed for three objectives: physi cal data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and backchase with constraints (dependen cies). By using dictionaries (finite functions) in physical schemas we can capture with con straints useful access structures such as indexes, materialized views, source capabilities, access support relations, gmaps, etc. The search space for query plans is defined and enumerated in a novel manner: the chase phase rewrites the original query into a universal plan that integrates all the access structures and alternative pathways that are allowed by appli cable constraints. Then, the backchase phase produces optimal plans by eliminating various combinations of redundancies, again according to constraints. This method is applicable (sound) to a large class of queries, physical access structures, and semantic constraints. We prove that it is in fact complete for path-conjunctive queries and views with complex objects, classes and dictio naries, going beyond previous theoretical work on processing queries using materialized views
- …