Search CORE

14 research outputs found

RefConcile – automated online reconciliation of bibliographic references

Author: A. Polaszek
D. Defays
D. Geer
G. Sautter
H. Köpcke
H. Köpcke
J. Beall
K. Davies
K.S. Jones
M.A. Jaro
M.A. Jaro
T. Blakely
V.I. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Comprehensive bibliographies often rely on community contributions. In such a setting, de-duplication is mandatory for the bibliography to be useful. Ideally, it works online, i.e., during the addition of new references, so the bibliography remains duplicate-free at all times. While de-duplication is well researched, generic approaches do not achieve the result quality required for automated reconciliation. To overcome this problem, we propose a new duplicate detection and reconciliation technique called RefConcile. Aimed specifically at bibliographic references, it uses dedicated blocking and matching techniques tailored to this type of data. Our evaluation based on a large real-world collection of bibliographic references shows that RefConcile scales well, and that it detects and reconciles duplicates highly accurately

Crossref

Open Research Online (The Open University)

Swoosh: a generic approach to entity resolution

Author: A. Motro
David Menestrina
H.B. Newcombe
Hector Garcia-Molina
I.P. Fellegi
Jennifer Widom
M.A. Hernández
M.A. Jaro
Omar Benjelloun
Qi Su
R.E. Tarjan
S. Tejada
Steven Euijong Whang
T.F. Smith
W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques

Author: C. Bizer
D. Castelli
D.D. Roure
G. Crane
J.C. Wallis
L. Candela
L. Candela
L.K. Stapleton
M.A. Jaro
P. Nowakowski
P.V. Gorp
R. Shen
R.W. Hamming
S.B. Needleman
T. Blanke
T.F. Smith
V. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Connecting family trees to construct a population-scale and longitudinal geo-social network for the U.S.

Author: Bailey M.
Hey D.
Jaro M.A.
Rahm E.
Ruggles S.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

A Latent Class Approach for Allocation of Employees to Local Units

Author: L.M. Collins
M.A. Jaro
M.D. Larsen
V. Ramaswamy
W.E. Winkler
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Ontology-Based Approach for Product Entity Resolution on the Web

Author: A. Elmagarmid
G. Petasis
H. Köpcke
L.K. McDowell
M. Hepp
M.A. Jaro
O. Benjelloun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

EUR Research Repository

Identification of FRBR Works Within Bibliographic Databases: An Experiment with UNIMARC and Duplicate Detection Techniques

Author: A.K. Elmagarmid
G. Salton
H. Zhao
M. Bilenko
M. Zhao
M.A. Jaro
S. Lawrence
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Crossref

Towards Evaluating an Ontology-Based Data Matching Strategy for Retrieval and Recommendation of Security Annotations for Business Process Models

Author: C. Francescomarino Di
H.S. Bhola
I. Ciuciu
M.A. Jaro
M.Q. Patton
P. Jain
P. Spyns
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/06/2011
Field of study

International audienceIn the Trusted Architecture for Securely Shared Services (TAS3) EC FP7 project we have developed a method to provide semantic support to the process modeler during the design of secure business process models. Its supporting tool, called Knowledge Annotator (KA), is using ontology-based data matching algorithms and strategy in order to infer the recommendations the best fitted to the user design intent, from a dedicated knowledge base. The paper illustrates how the strategy is used to perform the similarity (matching) check in order to retrieve the best design recommendation. We select the security and privacy domain for trust policy specification for the concept illustration. Finally, the paper discusses the evaluation of the results using the Ontology-based Data Matching Framework evaluation benchmark

Crossref

Outlier Protection in Continuous Microdata Masking

Author: J. Domingo-Ferrer
J. Domingo-Ferrer
J. Domingo-Ferrer
L. Willenborg
M.A. Jaro
R.A. Dandekar
W.E. Yancey
Publication venue
Publication date: 01/01/2004
Field of study

Masking methods protect data sets against disclosure by perturbing the original values before publication. Masking causes some information loss (masked data are not exactly the same as original data) and does not completely suppress the risk of disclosure for the individuals behind the data set. Information loss can be measured by observing the di#erences between original and masked data while disclosure risk can be measured by means of record linkage and confidentiality intervals

CiteSeerX

Crossref

An Efficient Duplicate Record Detection Using q-Grams Array Inverted Index

Author: A. Chatterjee
E. Sutinen
E. Ukkonen
J. Han
J. Ullman
M.A. Jaro
P. Christen
T.F. Smith
U. Manber
V.I. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Duplicate record detection is a crucial task for data cleaning process in data warehouse systems. Many approaches have been presented to address this problem: some of these rely on the accuracy of the resulted records, others focus on the efficiency of the comparison process. Following the first direction, we introduce two similarity functions based on the concept of q-grams that contribute to improve accuracy of duplicate detection process with respect to other well known measures. We also reduce the number and the running time of record comparisons by building an inverted index on a sorted list of q-grams, named q-grams array. Then, we extend this approach to perform a clustering process based on the proposed q-grams array. Finally, an experimental analysis on synthetic and real data shows the efficiency of the novel indexing method for both record comparison process and clustering

Crossref

Catalogo dei prodotti della ricerca