Search CORE

40 research outputs found

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

Author: Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2010
Field of study

In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

University of Twente Research Information

Duplicate Detection in Probabilistic Data

Author: Keijzer Ander de
Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

CiteSeerX

Crossref

University of Twente Research Information

Datenunvollständigkeit aufgrund der mangelnden Modellierungsmächtigkeit aktuell dominierender Datenmodelle

Author: Panse Fabian (gnd: 1026704634)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

In den am weitesten verbreiteten Datenmodellen, speziell dem relationalen Datenmodell, werden Informationen über die Ausprägungen einzelner Objekteigenschaften in Attributen gespeichert. In vielen Fällen (z.B. bei partiellen Informationen) ist eine Darstellung durch einzelne Elemente des dem Attribut zugehörigen Wertebereichs jedoch nicht möglich und erfordert die Anwendung spezieller Konzepte (z.B. Nullwerte). In aktuell verwendeten Modellen sind diese Konzepte jedoch nur unzureichend auf die notwendigen Erfordernisse ausgelegt. Die ursprünglich vorliegenden Informationen lassen sich daher oft nicht wieder aus den gespeicherten Daten zurückgewinnen. Bisherige Ansätze zur Behebung dieses Problems haben sich aus unterschiedlichen Gründen nicht durchsetzen können. Die hier beschriebene Arbeit enthält daher einen entsprechenden Vorschlag, der sowohl den Informationsverlust während der Datenspeicherung verringern als auch die Schwächen der bisherigen Lösungsansätze hinsichtlich eines fehlenden Durchsetzungsvermögen vermeiden soll. Ersteres wird durch eine Verwendung mehrerer Nullwerten ermöglicht, letzteres beruht hauptsächlich auf der Vermeidung gravierender Abweichungen von den aktuell vorherrschenden Modellen. Da dies wiederum eine Beibehaltung wichtiger und fundamentaler Konzepte erfordert, muss die Auswertung der verschiedenen Nullwerte in der dreiwertigen Logik erfolgen. Neben einer Kompatibilität zu den vorherrschenden Modellen, bietet dieses Vorgehen zudem die Vorteile einer geringen Modellkomplexität und ermöglicht eine intuitive Handhabung der auf dem neu entworfenen Modell basierenden Systeme

Rostocker Dokumentenserver

Indeterministic Handling of Uncertain Decisions in Deduplication

Author: Batini C.
Benjelloun O.
Dechter R.
Fabian Panse
Koch C.
Koudas N.
Maurice van Keulen
Norbert Ritter
Ravikumar P. D.
Sen P.
Wang Y. R.
Widom J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Relational data completeness in the presence of maybe-tuples

Author: Fabian Panse
Norbert Ritter
Publication venue: 'Lavoisier'
Publication date
Field of study

Crossref

Indeterministic Handling of Uncertain Decisions in Deduplication

Author: Panse Fabian
Ritter Norbert
van Keulen Maurice
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2013
Field of study

In current research and practice, deduplication is usually considered as a deterministic approach in which tuples are either declared to be duplicates or not. In ambiguous situations, however, it is often not completely clear-cut, which tuples represent the same real-world entity. In deterministic approaches, many realistic possibilities may be ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for deduplication by using a probabilistic target model including techniques for proper probabilistic interpretation of similarity matching results. Thus, instead of deciding for a most likely situation, all realistic situations are modeled in the resultant data. This approach minimizes the negative impact of false decisions. Furthermore, the deduplication process becomes almost fully automatic and human effort can be reduced to a large extent. To increase applicability, we introduce several semi-indeterministic methods that heuristically reduce the set of indeterministically handled decisions in several meaningful ways. We also describe a full-indeterministic method for theoretical and presentational reasons

Crossref

University of Twente Research Information

Selten sofort erkannt: Zytopenien als Erstmanifestation einer Telomeropathie im Erwachsenenalter

Author: Beier Fabian
Brümmendorf Tim Henrik
Panse Jens
Schmitt Karla
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2016
Field of study

Publikationsserver der RWTH Aachen University

Duplicate Detection in Probabilistic Data

Author: de Keijzer Ander
Panse Fabian
Ritter Norbert
van Keulen Maurice
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2010
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities

Large-scale Traffic Simulation for Smart City Planning with MARS

Author: Clemen Thomas
Glake Daniel
Lenfers Ulfia A.
Panse Fabian
Ritter Norbert
Weyl Julius
Publication venue: 'Test accounts'
Publication date: 01/01/2019
Field of study

REPOSIT