Search CORE

1,715 research outputs found

Structurally Tractable Uncertain Data

Author: Abiteboul S.
Abiteboul S.
Agrawal R.
Amarilli A.
Carlson A.
Courcelle B.
Deutch D.
Dong X.
Galárraga L.
Gottlob G.
Lauritzen S. L.
Maniu S.
Raedt L. D.
Robertson N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2015
Field of study

Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium 201

arXiv.org e-Print Archive

Constraining the Search Space in Temporal Pattern Mining

Author: Herzog Otthein
Lattner Andreas D.
Publication venue
Publication date: 28/04/2011
Field of study

Agents in dynamic environments have to deal with complex situations including various temporal interrelations of actions and events. Discovering frequent patterns in such scenes can be useful in order to create prediction rules which can be used to predict future activities or situations. We present the algorithm MiTemP which learns frequent patterns based on a time intervalbased relational representation. Additionally the problem has also been transfered to a pure relational association rule mining task which can be handled by WARMR. The two approaches are compared in a number of experiments. The experiments show the advantage of avoiding the creation of impossible or redundant patterns with MiTemP. While less patterns have to be explored on average with MiTemP more frequent patterns are found at an earlier refinement level

University of Hildesheim

Putting Context into Schema Matching

Author: Bohannon Philip
Elnahrawy Eiman
Fan Wenfei
Flaster Michael
Publication venue
Publication date: 01/01/2006
Field of study

Interactive Constrained Association Rule Mining

Author: Bussche Jan Van den
Goethals Bart
Publication venue
Publication date: 01/01/2003
Field of study

We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

arXiv.org e-Print Archive

CiteSeerX

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

Author: Lee M. S.
Moore A.
Publication venue
Publication date: 01/01/1997
Field of study

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX