Search CORE

23 research outputs found

Determining the Currency of Data

Author: Berti-Equille L.
Floris Geerts
Jef Wijsen
van der Meyden R.
Wenfei Fan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

A Uniform Dependency Language for Improving Data Quality

Author: Fan Wenfei
Geerts Floris
Publication venue
Publication date: 01/01/2011
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Query Stability in Monotonic Data-Aware Business Processes [Extended Version]

Author: Marengo Elisa
Nutt Werner
Savkovic Ognjen
Publication venue
Publication date: 21/12/2015
Field of study

Organizations continuously accumulate data, often according to some business processes. If one poses a query over such data for decision support, it is important to know whether the query is stable, that is, whether the answers will stay the same or may change in the future because business processes may add further data. We investigate query stability for conjunctive queries. To this end, we define a formalism that combines an explicit representation of the control flow of a process with a specification of how data is read and inserted into the database. We consider different restrictions of the process model and the state of the system, such as negation in conditions, cyclic executions, read access to written data, presence of pending process instances, and the possibility to start fresh process instances. We identify for which facet combinations stability of conjunctive queries is decidable and provide encodings into variants of Datalog that are optimal with respect to the worst-case complexity of the problem.Comment: This report is the extended version of a paper accepted at the 19th International Conference on Database Theory (ICDT 2016), March 15-18, 2016 - Bordeaux, Franc

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

DPIF: A framework for distinguishing unintentional quality problems from potential shilling attacks

Author: Li M
Su S
Sun Y
Tian Z
Wang X
Wang Y
Publication venue: 'Computers, Materials and Continua (Tech Science Press)'
Publication date: 01/01/2019
Field of study

Copyright © 2019 Tech Science Press. Maliciously manufactured user profiles are often generated in batch for shilling attacks. These profiles may bring in a lot of quality problems but not worthy to be repaired. Since repairing data always be expensive, we need to scrutinize the data and pick out the data that really deserves to be repaired. In this paper, we focus on how to distinguish the unintentional data quality problems from the batch generated fake users for shilling attacks. A two-steps framework named DPIF is proposed for the distinguishment. Based on the framework, the metrics of homology and suspicious degree are proposed. The homology can be used to represent both the similarities of text and the data quality problems contained by different profiles. The suspicious degree can be used to identify potential attacks. The experiments on real-life data verified that the proposed framework and the corresponding metrics are effective

OPUS - University of Technology Sydney

Computing Possible and Certain Answers over Order-Incomplete Data

Author: Amarilli Antoine
Ba Mouhamadou Lamine
Deutch Daniel
Senellart Pierre
Publication venue
Publication date: 01/01/2019
Field of study

This paper studies the complexity of query evaluation for databases whose relations are partially ordered; the problem commonly arises when combining or transforming ordered data from multiple sources. We focus on queries in a useful fragment of SQL, namely positive relational algebra with aggregates, whose bag semantics we extend to the partially ordered setting. Our semantics leads to the study of two main computational problems: the possibility and certainty of query answers. We show that these problems are respectively NP-complete and coNP-complete, but identify tractable cases depending on the query operators or input partial orders. We further introduce a duplicate elimination operator and study its effect on the complexity results.Comment: 55 pages, 56 references. Extended journal version of arXiv:1707.07222. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Detecting Ambiguity in Prioritized Database Repairing

Author: Kimelfeld Benny
Livshits Ester
Peterfreund Liat
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Conference on Database Theory (ICDT 2017)
Publication date: 01/01/2017
Field of study

In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate

Dagstuhl Research Online Publication Server

Possible and Certain Answers for Queries over Order-Incomplete Data

Author: Amarilli Antoine
Ba Mouhamadou Lamine
Deutch Daniel
Senellart Pierre
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Symposium on Temporal Representation and Reasoning (TIME 2017)
Publication date: 01/01/2017
Field of study

To combine and query ordered data from multiple sources, one needs to handle uncertainty about the possible orderings. Examples of such "order-incomplete" data include integrated event sequences such as log entries; lists of properties (e.g., hotels and restaurants) ranked by an unknown function reflecting relevance or customer ratings; and documents edited concurrently with an uncertain order on edits. This paper introduces a query language for order-incomplete data, based on the positive relational algebra with order-aware accumulation. We use partial orders to represent order-incomplete data, and study possible and certain answers for queries in this context. We show that these problems are respectively NP-complete and coNP-complete, but identify many tractable cases depending on the query operators or input partial orders

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Hal-Diderot