Search CORE

21,959 research outputs found

Duplicate Detection in Probabilistic Data

Author: Keijzer Ander de
Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

CiteSeerX

Crossref

University of Twente Research Information

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

Author: Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2010
Field of study

In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

University of Twente Research Information

Quality Assessment of Linked Datasets using Probabilistic Approximation

Author: A Hogan
AZ Broder
BH Bloom
C Guéret
JS Vitter
P Hitzler
Publication venue
Publication date: 17/03/2015
Field of study

With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

Maastricht University Research Portal

University of Twente Research Information

Boosting Linear-Optical Bell Measurement Success Probability with Pre-Detection Squeezing and Imperfect Photon-Number-Resolving Detectors

Author: Guha Saikat
Kilmer Thomas
Publication venue: 'American Physical Society (APS)'
Publication date: 28/11/2018
Field of study

Linear optical realizations of Bell state measurement (BSM) on two single-photon qubits succeed with probability

p_s

no higher than

0.5

. However pre-detection quadrature squeezing, i.e., quantum noise limited phase sensitive amplification, in the usual linear-optical BSM circuit, can yield

{p_s \approx 0.643}

. The ability to achieve

p_s > 0.5

has been found to be critical in resource-efficient realizations of linear optical quantum computing and all-photonic quantum repeaters. Yet, the aforesaid value of

p_s > 0.5

is not known to be the maximum achievable using squeezing, thereby leaving it open whether close-to-

100\%

efficient BSM might be achievable using squeezing as a resource. In this paper, we report new insights on why squeezing-enhanced BSM achieves

p_s > 0.5

. Using this, we show that the previously-reported

{p_s \approx 0.643}

at single-mode squeezing strength

r=0.6585

---for unambiguous state discrimination (USD) of all four Bell states---is an experimentally unachievable point result, which drops to

p_s \approx 0.59

with the slightest change in

r

. We however show that squeezing-induced boosting of

p_s

with USD operation is still possible over a continuous range of

r

, with an experimentally achievable maximum occurring at

r=0.5774

, achieving

{p_s \approx 0.596}

. Finally, deviating from USD operation, we explore a trade-space between

p_s

, the probability with which the BSM circuit declares a "success", versus the probability of error

p_e

, the probability of an input Bell state being erroneously identified given the circuit declares a success. Since quantum error correction could correct for some

p_e > 0

, this tradeoff may enable better quantum repeater designs by potentially increasing the entanglement generation rates with

p_s

exceeding what is possible with traditionally-studied USD operation of BSMs.Comment: 13 pages, 10 figure

arXiv.org e-Print Archive

The University of Arizona

UNH Monitoring Activities that Support the National Coastal Assessment in 2007

Author: Trowbridge Philip
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 11/07/2008
Field of study

The National Coastal Assessment is an Environmental Protection Agency program to monitor the health of the nation’s estuaries using nationally standardized methods and a probabilistic sampling design. Dedicated EPA funding for the National Coastal Assessment ceased after 2006. Therefore, the NH Department of Environmental Services and the New Hampshire Estuaries Project contributed funds to continue a portion of the National Coastal Assessment in 2007. Water quality measurements were successfully made during 2007 at 25 randomly located stations throughout the Great Bay Estuary and Hampton-Seabrook Harbor. These data will be combined with samples collected in 2006 for probabilistic assessments of estuarine water quality during the 2006-2007 period in the NHEP Water Quality Indicators Report in 2009

UNH Scholars' Repository

Strategies for estimating human exposure to mycotoxins via food

Author: Boon PE
de Nijs M
Heyndrickx Ellen
Hoogenboom LAP
Lopez P
Mengelers MJB
Mol HGJ
Publication venue: 'Wageningen Academic Publishers'
Publication date: 01/01/2016
Field of study

In this review, five strategies to estimate mycotoxin exposure of a (sub-) population via food, including data collection, are discussed with the aim to identify the added values and limitations of each strategy for risk assessment of these chemicals. The well-established point estimate, observed individual mean, probabilistic and duplicate diet strategies are addressed, as well as the emerging human biomonitoring strategy. All five exposure assessment strategies allow the estimation of chronic (long-term) exposure to mycotoxins, and, with the exception of the observed individual mean strategy, also acute (short-term) exposure. Methods for data collection, i.e. food consumption surveys, food monitoring studies and total diet studies are discussed. In food monitoring studies, the driving force is often enforcement of legal limits, and, consequently, data are often generated with relatively high limits of quantification and targeted at products suspected to contain mycotoxin levels above these legal limits. Total diet studies provide a solid base for chronic exposure assessments since they provide mycotoxin levels in food based on well-defined samples and including the effect of food preparation. Duplicate diet studies and human biomonitoring studies reveal the actual exposure but often involve a restricted group of human volunteers and a limited time period. Human biomonitoring studies may also include exposure to mycotoxins from other sources than food, and exposure to modified mycotoxins that may not be detected with current analytical methods. Low limits of quantification are required for analytical methods applied for data collection to avoid large uncertainties in the exposure due to high numbers of left censored data, i.e. with levels below the limit of quantification

Ghent University Academic Bibliography

Platform Dependent Verification: On Engineering Verification Tools for 21st Century

Author: A. Aggarwal
A. B. Kahn
Alfons Laarman
Armin Biere
B. R. Haverkort
Boudewijn R. Haverkort
Brad Bingham
Cornelia P. Inggs
D. Bosnacki
David L. Dill
Doron Peled
E. Allen Emerson
E. M. Clarke
E.M. Clarke
Flavio Lerda
Flavio Lerda
G. Behrmann
G. Ciardo
G. Jayachandran
Gerard J. Holzmann
Gerard J. Holzmann
Gerard J. Holzmann
Gianfranco Ciardo
Giuseppe Della Penna
H. Garavel
I. Černá
I. Černá
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. R. Burch
Jaco Geldenhuys
Jiří Barnat
Jiří Barnat
K. Verstoep
Keijo Heljanko
Keijo Heljanko
L. Brim
L. Brim
Luboš Brim
M.Y. Vardi
Michael Jones
Moritz Hammer
Naga K. Govindaraju
P. Harish
Peter Lamborn
R. Korf
R. Korf
R. Pel\IeC ánek
Rahul Kumar
Rong Zhou
S. Allmaier
S. Caselli
Sami Evangelista
Shahid Jabbar
Shahid Jabbar
Stefan Edelkamp
T. von Eicken
Tonglaga Bao
U. Stern
U. Stern
W. Knottenbelt
W. Knottenbelt
Yi-Jen Chiang
Publication venue: 'Open Publishing Association'
Publication date: 01/10/2011
Field of study

The paper overviews recent developments in platform-dependent explicit-state LTL model checking.Comment: In Proceedings PDMC 2011, arXiv:1111.006

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals