23 research outputs found

    Query Stability in Monotonic Data-Aware Business Processes [Extended Version]

    Get PDF
    Organizations continuously accumulate data, often according to some business processes. If one poses a query over such data for decision support, it is important to know whether the query is stable, that is, whether the answers will stay the same or may change in the future because business processes may add further data. We investigate query stability for conjunctive queries. To this end, we define a formalism that combines an explicit representation of the control flow of a process with a specification of how data is read and inserted into the database. We consider different restrictions of the process model and the state of the system, such as negation in conditions, cyclic executions, read access to written data, presence of pending process instances, and the possibility to start fresh process instances. We identify for which facet combinations stability of conjunctive queries is decidable and provide encodings into variants of Datalog that are optimal with respect to the worst-case complexity of the problem.Comment: This report is the extended version of a paper accepted at the 19th International Conference on Database Theory (ICDT 2016), March 15-18, 2016 - Bordeaux, Franc

    DPIF: A framework for distinguishing unintentional quality problems from potential shilling attacks

    Full text link
    Copyright © 2019 Tech Science Press. Maliciously manufactured user profiles are often generated in batch for shilling attacks. These profiles may bring in a lot of quality problems but not worthy to be repaired. Since repairing data always be expensive, we need to scrutinize the data and pick out the data that really deserves to be repaired. In this paper, we focus on how to distinguish the unintentional data quality problems from the batch generated fake users for shilling attacks. A two-steps framework named DPIF is proposed for the distinguishment. Based on the framework, the metrics of homology and suspicious degree are proposed. The homology can be used to represent both the similarities of text and the data quality problems contained by different profiles. The suspicious degree can be used to identify potential attacks. The experiments on real-life data verified that the proposed framework and the corresponding metrics are effective

    Computing Possible and Certain Answers over Order-Incomplete Data

    Full text link
    This paper studies the complexity of query evaluation for databases whose relations are partially ordered; the problem commonly arises when combining or transforming ordered data from multiple sources. We focus on queries in a useful fragment of SQL, namely positive relational algebra with aggregates, whose bag semantics we extend to the partially ordered setting. Our semantics leads to the study of two main computational problems: the possibility and certainty of query answers. We show that these problems are respectively NP-complete and coNP-complete, but identify tractable cases depending on the query operators or input partial orders. We further introduce a duplicate elimination operator and study its effect on the complexity results.Comment: 55 pages, 56 references. Extended journal version of arXiv:1707.07222. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theoretical Computer Scienc

    Detecting Ambiguity in Prioritized Database Repairing

    Get PDF
    In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate

    Possible and Certain Answers for Queries over Order-Incomplete Data

    Get PDF
    To combine and query ordered data from multiple sources, one needs to handle uncertainty about the possible orderings. Examples of such "order-incomplete" data include integrated event sequences such as log entries; lists of properties (e.g., hotels and restaurants) ranked by an unknown function reflecting relevance or customer ratings; and documents edited concurrently with an uncertain order on edits. This paper introduces a query language for order-incomplete data, based on the positive relational algebra with order-aware accumulation. We use partial orders to represent order-incomplete data, and study possible and certain answers for queries in this context. We show that these problems are respectively NP-complete and coNP-complete, but identify many tractable cases depending on the query operators or input partial orders
    corecore