Search CORE

47,261 research outputs found

Query-Driven Sampling for Collective Entity Resolution

Author: Grant Christan
Wang Daisy Zhe
Wick Michael L.
Publication venue
Publication date: 13/08/2015
Field of study

Probabilistic databases play a preeminent role in the processing and management of uncertain data. Recently, many database research efforts have integrated probabilistic models into databases to support tasks such as information extraction and labeling. Many of these efforts are based on batch oriented inference which inhibits a realtime workflow. One important task is entity resolution (ER). ER is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators --- selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions

arXiv.org e-Print Archive

Crossref

International Governance of the Internet: An Economic Analysis.

Author: Gordon L. Brady
Publication venue
Publication date
Field of study

ICANN currently determines which top level domains are available on the A-root server and so restricts the choices facing Internet users. Thus ICANN redistributes wealth and has become the focus of rent-seeking activities. Yet, despite my belief that the Internet will become substantially more regulated in the future, I am convinced that technology will trump the best efforts of regulators to “promote the public interest”.

Research Papers in Economics

`Iconoclastic', Categorical Quantum Gravity

Author: A. Ashtekar
A. Ashtekar
A. Ashtekar
A. Connes
A. Connes
A. Einstein
A. Einstein
A. K. Guts
A. Kock
A. Kock
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. Mallios
A. S. Eddington
A. Schopenhauer
A. Strominger
C. G. Torre
C. H. Taubes
C. J. Isham
C. J. Isham
C. J. Isham
C. J. Isham
C. J. Isham
C. J. S. Clarke
D. Ivanenko
D. Kastler
E. Vassiliou
I. Raptis
Ioannis Raptis
J. Butterfield
J. C. Baez
J. C. Baez
J. J. Stachel
J. J. Stachel
J. J. Stachel
J. J. Stachel
J. Stachel
L. Bombelli
L. Crane
L. D. Faddeev
L. Wittgenstein
L. Wittgenstein
M. Kriele
P. A. M. Dirac
P. G. Bergmann
R. D. Sorkin
R. D. Sorkin
R. Geroch
R. Jackiw
R. Lavendhomme
R. P. Feynman
R. P. Feynman
R. Penrose
S. A. Selesnick
S. MacLane
S. S. Chern
W. Heisenberg
W. Pauli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/09/2005
Field of study

This is a two-part, `2-in-1' paper. In Part I, the introductory talk at `Glafka--2004: Iconoclastic Approaches to Quantum Gravity' international theoretical physics conference is presented in paper form (without references). In Part II, the more technical talk, originally titled ``Abstract Differential Geometric Excursion to Classical and Quantum Gravity'', is presented in paper form (with citations). The two parts are closely entwined, as Part I makes general motivating remarks for Part II.Comment: 34 pages, in paper form 2 talks given at ``Glafka--2004: Iconoclastic Approaches to Quantum Gravity'' international theoretical physics conference, Athens, Greece (summer 2004

arXiv.org e-Print Archive

Crossref

CERN Document Server

Uburyo : Delivering of a Sustainable System of Loans for Education : Volume I y II

Author: Martinez-Larraz Eduardo
Ramírez Robles Máximo
Publication venue: Facultad de Informática (UPM)
Publication date: 02/12/2010
Field of study

Integrating a Grants Manager and an Employment Bureau Uburyo, a sustainable mini-loans system, will help academic institutions from developing countries to administrate subventions in order to grow economically and get more and more students

Archivo Digital UPM

Schema-agnostic progressive entity resolution

Author: Bergamaschi S.
Palpanas T.
Papadakis G.
Simonini G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In practice, its goal is to provide the best possible partial solution by approximating the optimal comparison order of the entity profiles. So far, Progressive ER has only been examined in the context of structured (relational) data sources, as the existing methods rely on schema knowledge to save unnecessary comparisons: they restrict their search space to similar entities with the help of schema-based blocking keys (i.e., signatures that represent the entity profiles). As a result, these solutions are not applicable in Big Data integration applications, which involve large and heterogeneous datasets, such as relational and RDF databases, JSON files, Web corpus etc. To cover this gap, we propose a family of schema-agnostic Progressive ER methods, which do not require schema information, thus applying to heterogeneous data sources of any schema variety. First, we introduce two na\uefve schema-agnostic methods, showing that straightforward solutions exhibit a poor performance that does not scale well to large volumes of data. Then, we propose four different advanced methods. Through an extensive experimental evaluation over 7 real-world, established datasets, we show that all the advanced methods outperform to a significant extent both the na\uefve and the state-of-the-art schema-based ones. We also investigate the relative performance of the advanced methods, providing guidelines on the method selection

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia