Search CORE

31 research outputs found

Exclusive Strategy for Generalization Algorithms in Micro-data Disclosure

Author: A. Dobra
D.P. Dobkin
F. Chin
I.P. Fellegi
J. Byun
L. Sweeney
N.R. Adam
T. Dalenius
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Abstract. When generalization algorithms are known to the public, an adver-sary can obtain a more precise estimation of the secret table than what can be deduced from the disclosed generalization result. Therefore, whether a general-ization algorithm can satisfy a privacy property should be judged based on such an estimation. In this paper, we show that the computation of the estimation is inherently a recursive process that exhibits a high complexity when generaliza-tion algorithms take a straightforward inclusive strategy. To facilitate the design of more efficient generalization algorithms, we suggest an alternative exclusive strategy, which adopts a seemingly drastic approach to eliminate the need for recursion. Surprisingly, the data utility of the two strategies are actually not com-parable and the exclusive strategy can provide better data utility in certain cases.

CiteSeerX

Crossref

Searching for Radio Pulsars in 3EG Sources at Urumqi Observatory

Author: B. Doerr
B. Doerr
B. Doerr
B.D. Causey
D.E. Knuth
G. Steiner
I.P. Fellegi
J. Beck
J. Spencer
J.L. Bentley
K. Sadakane
K. Sadakane
L. Willenborg
L.H. Cox
L.H. Cox
L.R. Ford Jr.
M. Bacharach
N. Brauner
P. Raghavan
T. Asano
Y. Monden
Y. Monden
Publication venue
Publication date: 01/01/2006
Field of study

Since mid-2005, a pulsar searching system has been operating at 18 cm on the 25-m radio telescope of Urumqi Observatory. Test observations on known pulsars show that the system can perform the intended task. The prospect of using this system to observe 3EG sources and other target searching tasks is discussed.Comment: a training project about MSc thesi

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Scaling Record Linkage to Non-uniform Distributed Class Sizes

Author: A. Culotta
H. Newcombe
I.P. Fellegi
P. Christen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

A logical formalisation of the Fellegi-Holt method of data cleaning

Author: A. Nerode
I.P. Fellegi
L. Simon
R. Bruni
R.S. Garfinkel
Publication venue: Springer-Verlag
Publication date: 01/01/2003
Field of study

The Fellegi-Holt method automatically "corrects" data that fail some predefined requirements. Computer implementations of the method were used in many national statistics agencies but are less used now because they are slow. We recast the method in propositional logic, and show that many of its results are well-known results in propositional logic. In particular we show that the Fellegi-Holt method of "edit generation" is essentially the same as a technique for automating logical deduction called resolution. Since modern implementations of resolution are capable of handling large problems efficiently, they might lead to more efficient implementations of the Fellegi-Holt method

CiteSeerX

Crossref

Integration of semantically annotated data by the KnoFuss architecture

Author: A.K. Elmagarmid
E. Motta
I.P. Fellegi
J. Euzenat
M. Ehrig
M. Ehrig
U. Straccia
W. Kim
Y. Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolution, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these features are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations

CiteSeerX

Crossref

Open Research Online (The Open University)

L-Cover: Preserving Diversity by Anonymity

Author: A. Dobra
A. Slavkovic
I.P. Fellegi
L. Sweeney
S. Chawla
T. Dalenius
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution ⋆

Author: A.K. Elmagarmid
D. Dey
I. Bhattacharya
I.P. Fellegi
P. Christen
S.E. Whang
T.N. Herzog
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. Entity resolution is the process of identifying groups of records in a single or multiple data sources that represent the same real-world entity. It is an important tool in data de-duplication, in linking records across databases, and in matching query records against a database of existing entities. Most existing entity resolution techniques complete the resolution process offline and on static databases. However, real-world databases are often dynamic, and increasingly organizations need to resolve entities in real-time. Thus, there is a need for new techniques that facilitate working with dynamic databases in real-time. In this paper, we propose a dynamic similarity-aware inverted indexing technique (DySimII) that meets these requirements. We also propose a frequencyfiltered indexing technique where only the most frequent attribute values are indexed. We experimentally evaluate our techniques on a large realworld voter database. The results show that when the index size grows no appreciable increase is found in the average record insertion time (around 0.1 msec) and in the average query time (less than 0.1 sec). We also find that applying the frequency-filtered approach reduces the index size with only a slight drop in recall

CiteSeerX

Crossref