19 research outputs found
Characterization of XML Functional Dependencies and their Interaction with DTDs
With the rise of XML as a standard model of data exchange, XML
functional dependencies (XFDs) have become important to areas such as key analysis, document normalization, and data integrity. XFDs are more complicated than relational functional dependencies because the set of XFDs satisfied by an XML document depends not only on the document values, but also the tree structure and corresponding DTD. In particular, constraints imposed by DTDs may alter the implications from a base set of XFDs, and may even be inconsistent with a set of XFDs. In this paper we examine the interaction between XFDs and DTDs. We present a sound and complete axiomatization for XFDs, both alone and in the presence of certain classes of DTDs. We show that these DTD classes form an axiomatic hierarchy, with the axioms at each level a proper superset of the previous. Furthermore, we show that consistency checking with respect to a set of XFDs is feasible for these same classes
The Complexity of Social Coordination
Coordination is a challenging everyday task; just think of the last time you
organized a party or a meeting involving several people. As a growing part of
our social and professional life goes online, an opportunity for an improved
coordination process arises. Recently, Gupta et al. proposed entangled queries
as a declarative abstraction for data-driven coordination, where the difficulty
of the coordination task is shifted from the user to the database.
Unfortunately, evaluating entangled queries is very hard, and thus previous
work considered only a restricted class of queries that satisfy safety (the
coordination partners are fixed) and uniqueness (all queries need to be
satisfied). In this paper we significantly extend the class of feasible
entangled queries beyond uniqueness and safety. First, we show that we can
simply drop uniqueness and still efficiently evaluate a set of safe entangled
queries. Second, we show that as long as all users coordinate on the same set
of attributes, we can give an efficient algorithm for coordination even if the
set of queries does not satisfy safety. In an experimental evaluation we show
that our algorithms are feasible for a wide spectrum of coordination scenarios.Comment: VLDB201
The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis
Datastores today rely on distribution and replication to achieve improved
performance and fault-tolerance. But correctness of many applications depends
on strong consistency properties - something that can impose substantial
overheads, since it requires coordinating the behavior of multiple nodes. This
paper describes a new approach to achieving strong consistency in distributed
systems while minimizing communication between nodes. The key insight is to
allow the state of the system to be inconsistent during execution, as long as
this inconsistency is bounded and does not affect transaction correctness. In
contrast to previous work, our approach uses program analysis to extract
semantic information about permissible levels of inconsistency and is fully
automated. We then employ a novel homeostasis protocol to allow sites to
operate independently, without communicating, as long as any inconsistency is
governed by appropriate treaties between the nodes. We discuss mechanisms for
optimizing treaties based on workload characteristics to minimize
communication, as well as a prototype implementation and experiments that
demonstrate the benefits of our approach on common transactional benchmarks
Fine-grained disclosure control for app ecosystems
The modern computing landscape contains an increasing number of app ecosystems, where users store personal data on platforms such as Facebook or smartphones. APIs enable third-party applications (apps) to utilize that data. A key concern associated with app ecosystems is the confidentiality of user data.
In this paper, we develop a new model of disclosure in app ecosystems. In contrast with previous solutions, our model is data-derived and semantically meaningful. Information disclosure is modeled in terms of a set of distinguished security views. Each query is labeled with the precise set of security views that is needed to answer it, and these labels drive policy decisions.
We explain how our disclosure model can be used in practice and provide algorithms for labeling conjunctive queries for the case of single-atom security views. We show that our approach is useful by demonstrating the scalability of our algorithms and by applying it to the real-world disclosure
control system used by Facebook
Entangled queries: enabling declarative data-driven coordination
Many data-driven social and Web applications involve collaboration and coordination. The vision of Declarative Data-Driven Coordination (D3C), proposed in Kot et al. [2010], is to support coordination in the spirit of data management: to make it data-centric and to specify it using convenient declarative languages. This article introduces entangled queries, a language that extends SQL by constraints that allow for the coordinated choice of result tuples across queries originating from different users or applications. It is nontrivial to define a declarative coordination formalism without arriving at the general (NP-complete) Constraint Satisfaction Problem from AI. In this article, we propose an efficiently enforceable syntactic safety condition that we argue is at the sweet spot where interesting declarative power meets applicability in large-scale data management systems and applications. The key computational problem of D3C is to match entangled queries to achieve coordination. We present an efficient matching algorithm which statically analyzes query workloads and merges coordinating entangled queries into compound SQL queries. These can be sent to a standard database system and return only coordinated results. We present the overall architecture of an implemented system that contains our evaluation algorithm. We also describe a proof-of-concept Facebook application we have built on top of this system to allow friends to coordinate flight plans. Finally, we evaluate the performance of the matching algorithm experimentally on realistic coordination workloads
Kleene Algebra and Bytecode Verification
Most standard approaches to the static analysis of programs, such as
the popular worklist method, are first-order methods that inductively annotate program points with abstract values. In a recent paper we introduced a second-order approach based on Kleene algebra. In this approach, the primary objects of interest are not the abstract data values, but the transfer functions that manipulate them. These elements form a Kleene algebra. The dataflow labeling is not achieved by inductively labeling the program with abstract values, but rather by computing the star (Kleene closure) of a matrix of transfer functions. In this paper we show how this general framework applies to the problem of Java bytecode verification.We show how to specify transfer functions arising in Java bytecode verification in such a way that the Kleene algebra operations (join, composition, star) can be computed efficiently. We also give a hybrid dataflow analysis algorithm that computes the closure of a matrix on a cutset of the control flow graph, thereby avoiding the recalculation of dataflow information along long paths. This method could potentially improve the performance over the standard worklist algorithm when a small cutset can be found
Second-Order Abstract Interpretation via Kleene Algebra
Most standard approaches to the static analysis of programs, such as
the popular worklist method, are first-order methods that inductively annotate program points with abstract values. In this paper we introduce a second-order approach based on Kleene algebra. In this approach, the primary objects of interest are not the abstract data values, but the transfer functions that manipulate them. These elements form a Kleene algebra. The dataflow labeling is not achieved by inductively labeling the program with abstract values, but rather by computing the star (Kleene closure) of a matrix of transfer functions. In this paper we introduce the method and prove soundness and completeness with respect to the standard worklist algorithm
Youtopia: A Community Database Management System
This thesis introduces Youtopia, a system for collaborative management of relational data. In the age of Web 2.0, the sharing of relational data by communities is an increasingly important phenomenon. As a data management setting, it poses unique technical challenges which warrant dedicated solutions. We focus on two key aspects of Youtopia functionality: data cleanliness maintenance and handling interference between user tasks â or transactions â that access the same data. We present a set of operations for maintaining data cleanliness in a collaborative manner. A unique feature of our operation set is the mechanism for enforcing tuplegenerating dependencies (a formalism for constraint maintenance). In Youtopia, these are enforced through a process based on the classical chase that involves both automated and human-assisted steps. Next, we examine the issue of transaction interference. We provide a suitable deïŹnition of serializability for the Youtopia transaction model and present an algorithm framework that can be used to enforce it. We illustrate our framework with an extended case study that includes experimental results from the implementation of our algorithms. Finally, we begin the investigation of concurrency control notions other than serializability. We present several isolation levels for transactions which are less restrictive than full serializability, but can be implemented with much more lightweight mechanisms and thus enable the system to achieve better throughput and latency in the processing of user operations