22 research outputs found
A SAT-based System for Consistent Query Answering
An inconsistent database is a database that violates one or more integrity
constraints, such as functional dependencies. Consistent Query Answering is a
rigorous and principled approach to the semantics of queries posed against
inconsistent databases. The consistent answers to a query on an inconsistent
database is the intersection of the answers to the query on every repair, i.e.,
on every consistent database that differs from the given inconsistent one in a
minimal way. Computing the consistent answers of a fixed conjunctive query on a
given inconsistent database can be a coNP-hard problem, even though every fixed
conjunctive query is efficiently computable on a given consistent database.
We designed, implemented, and evaluated CAvSAT, a SAT-based system for
consistent query answering. CAvSAT leverages a set of natural reductions from
the complement of consistent query answering to SAT and to Weighted MaxSAT. The
system is capable of handling unions of conjunctive queries and arbitrary
denial constraints, which include functional dependencies as a special case. We
report results from experiments evaluating CAvSAT on both synthetic and
real-world databases. These results provide evidence that a SAT-based approach
can give rise to a comprehensive and scalable system for consistent query
answering.Comment: 25 pages including appendix, to appear in the 22nd International
Conference on Theory and Applications of Satisfiability Testin
Efficient Top-k Cloud Services Query Processing Using Trust and QoS
International audienc
Reinforcement Learning for Data Preparation with Active Reward Learning
International audienceDatacleaninganddatapreparationhavebeenlong-standingchallenges in data science to avoid incorrect results, biases, and misleading conclusions ob- tained from “dirty” data. For a given dataset and data analytics task, a plethora of data preprocessing techniques and alternative data cleaning strategies are avail- able, but they may lead to dramatically different outputs with unequal result quality performances. For adequate data preparation, the users generally do not know how to start with or which methods to use. Most current work can be classified into two categories: 1) they propose new data cleaning algorithms specific to certain types of data anomalies usually considered in isolation and without a “pipeline vision” of the entire data preprocessing strategy; 2) they develop automated machine learning approaches (AutoML) that can optimize the hyper- parameters of a considered ML model with a list of by-default preprocessing methods. We argue that more efforts should be devoted to proposing a principled and adaptive data preparation approach to help and learn from the user for selecting the optimal sequence of data preparation tasks to obtain the best quality performance of the final result. In this paper, we extend Learn2Clean, a method based on Q-Learning, a model-free reinforcement learning technique that selects, for a given data set, a given ML model, and a pre-selected quality performance metric, the optimal sequence of tasks for preprocessing the data such that the quality metric is maximized. We will discuss some new results of Learn2Clean for semi-automating data preparation with “the human in the loop” using active reward learning and Q-learning
Answering the Min-Cost Quality-Aware Query on Multi-Sources in Sensor-Cloud Systems
In sensor-based systems, the data of an object is often provided by multiple sources. Since the data quality of these sources might be different, when querying the observations, it is necessary to carefully select the sources to make sure that high quality data is accessed. A solution is to perform a quality evaluation in the cloud and select a set of high-quality, low-cost data sources (i.e., sensors or small sensor networks) that can answer queries. This paper studies the problem of min-cost quality-aware query which aims to find high quality results from multi-sources with the minimized cost. The measurement of the query results is provided, and two methods for answering min-cost quality-aware query are proposed. How to get a reasonable parameter setting is also discussed. Experiments on real-life data verify that the proposed techniques are efficient and effective
Extending Graph Pattern Matching with Regular Expressions
Graph pattern matching, which is to compute the set M(Q, G) of matches of Q in G, for the given pattern graph Q and data graph G, has been increasingly used in emerging applications e.g., social network analysis. As the matching semantic is typically defined in terms of subgraph isomorphism, two key issues are hence raised: the semantic is often too rigid to identify meaningful matches, and the problem is intractable, which calls for efficient matching methods. Motivated by these, this paper extends matching semantic with regular expressions, and investigates the top-k graph pattern matching problem. (1) We introduce regular patterns, which revise traditional pattern graphs by incorporating regular expressions; extend traditional matching semantic by allowing edge to regular path mapping. With the extension, more meaningful matches could be captured. (2) We propose a relevance function, that is defined in terms of tightness of connectivity, for ranking matches. Based on the ranking function, we introduce the top-k graph pattern matching problem, denoted by TopK. (3) We show that TopK is intractable. Despite hardness, we develop an algorithm with early termination property, i.e., it finds top-k matches without identifying entire match set. (4) Using real-life and synthetic data, we experimentally verify that our top-k matching algorithms are effective, and outperform traditional counterparts