20 research outputs found

    Process-Centric Views of Data-Driven Business Artifacts

    Get PDF
    Declarative, data-aware workflow models are becoming increasingly pervasive. While these have numerous benefits, classical process-centric specifications retain certain advantages. Workflow designers are used to development tools such as BPMN or UML diagrams, that focus on control flow. Views describing valid sequences of tasks are also useful to provide stake-holders with high-level descriptions of the workflow, stripped of the accompanying data. In this paper we study the problem of recovering process-centric views from declarative, data-aware workflow specifications in a variant of IBM\u27s business artifact model. We focus on the simplest and most natural process-centric views, specified by finite-state transition systems, and describing regular languages. The results characterize when process-centric views of artifact systems are regular, using both linear and branching-time semantics. We also study the impact of data dependencies on regularity of the views

    Semantic Optimization of Conjunctive Queries

    Get PDF
    This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≄ 1; the associated fragment is denoted GHWk. A CQ is semantically in GHWk if it is equivalent to a CQ in GHWk. The problem of checking whether a CQ is semantically in GHWk has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHWk under the constraints, while not being semantically in GHWk without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHWk, for a fixed k ≄ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHWk over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHWk alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHWk, we discuss how it can be approximated via a CQ that falls in GHWk in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable

    Dependencies for Graphs

    Get PDF

    Information-Based Distance Measures and the Canonical Reflection of View Updates

    Get PDF
    For the problem of reflecting an update on a database view to the main schema, the constant-complement strategies are precisely those which avoid all update anomalies, and so define the gold standard for well-behaved solutions to the problem. However, the families of view updates which are supported under such strategies are limited, so it is sometimes necessary to go beyond them, albeit in a systematic fashion. In this work, an investigation of such extended strategies is initiated for relational schemata. The approach is to characterize the information content of a database instance, and then require that the optimal reflection of a view update to the main schema embody the least possible change of information. The key property is identified to be strong monotonicity of the view, meaning that view insertions may always be reflected as insertions to the main schema, and likewise for deletions. In that context it is shown that for insertions and deletions, an optimal update, entailing the least change of information, exists and is unique up to isomorphism for wide classes of constraints

    Semantic Optimization in Tractable Classes of Conjunctive Queries

    Get PDF
    This paper reports on recent advances in semantic query optimization. We focus on the core class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has concentrated on identifying fragments of CQs that can be efficiently evaluated. One of the most general such restrictions corresponds to bounded generalized hypertreewidth, which extends the notion of acyclicity. Here we discuss the problem of reformulating a CQ into one of bounded generalized hypertreewidth. Furthermore, we study whether knowing that such a reformulation exists alleviates the cost of CQ evaluation. In case a CQ cannot be reformulated as one of bounded generalized hypertreewidth, we discuss how it can be approximated in an optimal way. All the above issues are examined both for the constraint-free case, and the case where constraints, in fact, tuple-generating and equality-generating dependencies, are present.Millennium Nucleus Center for Semantic Web Research NC120004 EPSRC Programme EP/M025268

    Cleaning data with Llunatic

    Get PDF
    Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper, we develop a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In Llunatic, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, Llunatic is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality

    Semantic Acyclicity Under Constraints

    Get PDF
    A conjunctive query (CQ) is semantically acyclic if it is equivalent to an acyclic one. Semantic acyclicity has been studied in the constraint-free case, and deciding whether a query enjoys this property is NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (tgds) that can express, e.g., inclusion dependencies, or equality-generating dependencies (egds) that capture, e.g., functional dependencies, a CQ may turn out to be semantically acyclic under the constraints while not semantically acyclic in general. This opens avenues to new query optimization techniques. In this paper we initiate and develop the theory of semantic acyclicity under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically acyclic under the constraints, or, in other words, is the query equivalent to an acyclic one over all those databases that satisfy the set of constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not sufficient condition for the decidability of semantic acyclicity. In particular, we show that semantic acyclicity is undecidable in presence of full tgds (i.e., Datalog rules). In view of this fact, we focus on the main classes of tgds for which CQ containment is decidable, and do not capture the class of full tgds, namely guarded, non-recursive and sticky tgds. For these classes we show that semantic acyclicity is decidable, and its complexity coincides with the complexity of CQ containment. In the case of egds, we show that semantic acyclicity is undecidable even over unary and binary predicates. When restricted to keys the problem becomes decidable (NP-complete) over such schemas. We finally consider the problem of evaluating a semantically acyclic query over a database that satisfies a set of constraints. For guarded tgds the evaluation problem is tractable. © Association Computing for Machiner

    Data Quality: From Theory to Practice

    Get PDF
    Data quantity and data quality, like two sides of a coin, are equally important to data management. This paper provides an overview of recent advances in the study of data quality, from theory to practice. We also address challenges intro-duced by big data to data quality management. 1
    corecore