6,424 research outputs found
Structurally Tractable Uncertain Data
Many data management applications must deal with data which is uncertain,
incomplete, or noisy. However, on existing uncertain data representations, we
cannot tractably perform the important query evaluation tasks of determining
query possibility, certainty, or probability: these problems are hard on
arbitrary uncertain input instances. We thus ask whether we could restrict the
structure of uncertain data so as to guarantee the tractability of exact query
evaluation. We present our tractability results for tree and tree-like
uncertain data, and a vision for probabilistic rule reasoning. We also study
uncertainty about order, proposing a suitable representation, and study
uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium
201
Learning Moore Machines from Input-Output Traces
The problem of learning automata from example traces (but no equivalence or
membership queries) is fundamental in automata learning theory and practice. In
this paper we study this problem for finite state machines with inputs and
outputs, and in particular for Moore machines. We develop three algorithms for
solving this problem: (1) the PTAP algorithm, which transforms a set of
input-output traces into an incomplete Moore machine and then completes the
machine with self-loops; (2) the PRPNI algorithm, which uses the well-known
RPNI algorithm for automata learning to learn a product of automata encoding a
Moore machine; and (3) the MooreMI algorithm, which directly learns a Moore
machine using PTAP extended with state merging. We prove that MooreMI has the
fundamental identification in the limit property. We also compare the
algorithms experimentally in terms of the size of the learned machine and
several notions of accuracy, introduced in this paper. Finally, we compare with
OSTIA, an algorithm that learns a more general class of transducers, and find
that OSTIA generally does not learn a Moore machine, even when fed with a
characteristic sample
Learning Linear Temporal Properties
We present two novel algorithms for learning formulas in Linear Temporal
Logic (LTL) from examples. The first learning algorithm reduces the learning
task to a series of satisfiability problems in propositional Boolean logic and
produces a smallest LTL formula (in terms of the number of subformulas) that is
consistent with the given data. Our second learning algorithm, on the other
hand, combines the SAT-based learning algorithm with classical algorithms for
learning decision trees. The result is a learning algorithm that scales to
real-world scenarios with hundreds of examples, but can no longer guarantee to
produce minimal consistent LTL formulas. We compare both learning algorithms
and demonstrate their performance on a wide range of synthetic benchmarks.
Additionally, we illustrate their usefulness on the task of understanding
executions of a leader election protocol
DESQ: Frequent Sequence Mining with Subsequence Constraints
Frequent sequence mining methods often make use of constraints to control
which subsequences should be mined. A variety of such subsequence constraints
has been studied in the literature, including length, gap, span,
regular-expression, and hierarchy constraints. In this paper, we show that many
subsequence constraints---including and beyond those considered in the
literature---can be unified in a single framework. A unified treatment allows
researchers to study jointly many types of subsequence constraints (instead of
each one individually) and helps to improve usability of pattern mining systems
for practitioners. In more detail, we propose a set of simple and intuitive
"pattern expressions" to describe subsequence constraints and explore
algorithms for efficiently mining frequent subsequences under such general
constraints. Our algorithms translate pattern expressions to compressed finite
state transducers, which we use as computational model, and simulate these
transducers in a way suitable for frequent sequence mining. Our experimental
study on real-world datasets indicates that our algorithms---although more
general---are competitive to existing state-of-the-art algorithms.Comment: Long version of the paper accepted at the IEEE ICDM 2016 conferenc
A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database
Constraint-based pattern discovery is at the core of numerous data mining
tasks. Patterns are extracted with respect to a given set of constraints
(frequency, closedness, size, etc). In the context of sequential pattern
mining, a large number of devoted techniques have been developed for solving
particular classes of constraints. The aim of this paper is to investigate the
use of Constraint Programming (CP) to model and mine sequential patterns in a
sequence database. Our CP approach offers a natural way to simultaneously
combine in a same framework a large set of constraints coming from various
origins. Experiments show the feasibility and the interest of our approach
Prefix-Projection Global Constraint for Sequential Pattern Mining
Sequential pattern mining under constraints is a challenging data mining
task. Many efficient ad hoc methods have been developed for mining sequential
patterns, but they are all suffering from a lack of genericity. Recent works
have investigated Constraint Programming (CP) methods, but they are not still
effective because of their encoding. In this paper, we propose a global
constraint based on the projected databases principle which remedies to this
drawback. Experiments show that our approach clearly outperforms CP approaches
and competes well with ad hoc methods on large datasets
- …