Search CORE

62 research outputs found

Making large information sources better accessible using fuzzy set theory

Author: B.P. Buckles
G. Tré De
G. Tré De
G. Tré De
G. Tré De
G. Tré De
H. Prade
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Ghent University Academic Bibliography

Putting Humpty-Dumpty together again: Reconstructing functions from their projections.

Author: Mehrotra Kishan
Menon Anil Ravindran
Mohan Chilukuri K.
Ranka Sanjay
Publication venue: SURFACE at Syracuse University
Publication date: 15/06/1993
Field of study

We present a problem decomposition approach to reduce neural net training times. The basic idea is to train neural nets in parallel on marginal distributions obtained from the original distribution (via projection), and then reconstruct the original table from the marginals (via a procedure similar to the join operator in database theory). A function is said to be reconstructible, if it may be recovered without error from its projections. Most distributions are non-reconstructible. The main result of this paper is the Reconstruction theorem, which enables non-reconstructible functions to be expressed in terms of reconstructible ones, and thus facilitates the application of decomposition methods

Syracuse University Research Facility and Collaborative Environment

Duplicate Detection in Probabilistic Data

Author: Keijzer Ander de
Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

CiteSeerX

Crossref

University of Twente Research Information

An Answer Explanation Model for Probabilistic Database Queries

Author: Apers Peter M.G.
Bunningen Arthur H. van
Feng Ling
Fokkinga Maarten M.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Following the availability of huge amounts of uncertain data, coming from diverse ranges of applications such as sensors, machine learning or mining approaches, information extraction and integration, etc. in recent years, we have seen a revival of interests in probabilistic databases. Queries over these databases result in probabilistic answers. As the process of arriving at these answers is based on the underlying stored uncertain data, we argue that from the standpoint of an end user, it is helpful for such a system to give an explanation on how it arrives at an answer and on which uncertainty assumptions the derived answer is based. In this way, the user with his/her own knowledge can decide how much confidence to place in this probabilistic answer. \ud The aim of this paper is to design such an answer explanation model for probabilistic database queries. We report our design principles and show the methods to compute the answer explanations. One of the main contributions of our model is that it fills the gap between giving only the answer probability, and giving the full derivation. Furthermore, we show how to balance verifiability and influence of explanation components through the concept of verifiable views. The behavior of the model and its computational efficiency are demonstrated through an extensive performance study

University of Twente Research Information

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

Author: Boley Mario
Mandros Panagiotis
Vreeken Jilles
Publication venue
Publication date: 01/01/2018
Field of study

The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.Comment: Accepted to Proceedings of the IEEE International Conference on Data Mining (ICDM'18

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MPG.PuRe