Search CORE

4,552 research outputs found

Duplicate Detection in Probabilistic Data

Author: Keijzer Ander de
Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

CiteSeerX

Crossref

University of Twente Research Information

Reasoning & Querying – State of the Art

Author: Bry François
Furche Tim
Weiand Klara
Publication venue
Publication date: 31/08/2008
Field of study

Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

Open Access LMU

Adaptive Bloom filter

Author: Bruck Jehoshua
Gao Jie
Jiang Anxiao (Andrew)
Publication venue: 'California Institute of Technology Library'
Publication date: 06/01/2006
Field of study

A Bloom filter is a simple randomized data structure that answers membership query with no false negative and a small false positive probability. It is an elegant data compression technique for membership information, and has broad applications. In this paper, we generalize the traditional Bloom filter to Adaptive Bloom Filter, which incorporates the information on the query frequencies and the membership likelihood of the elements into its optimal design. It has been widely observed that in many applications, some popular elements are queried much more often than the others. The traditional Bloom filter for data sets with irregular query patterns and non-uniform membership likelihood can be further optimized. We derive the optimal configuration of the Bloom filter with query-frequency and membership-likelihood information, and show that the adapted Bloom filter always outperforms the traditional Bloom filter. Under reasonable frequency models such as the step distribution or the Zipf's distribution, the improvement of the false positive probability of the adaptive Bloom filter over that of the traditional Bloom filter is usually of orders of magnitude

Caltech Authors

A logic programming framework for modeling temporal objects

Author: Kesim FN
Sergot M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Published versio

Crossref

Bilkent University Institutional Repository

Spiral - Imperial College Digital Repository

A Multi-faceted Provenance Solution for Science on the Web

Author: Edwards Peter
Pignotti Edoardo
Reid Richard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Crossref