Search CORE

61,495 research outputs found

Noise-tolerant approximate blocking for dynamic real-time entity resolution

Author: Christen Peter
Gayler Ross
Liang Huizhi
Wang Yanzhe
Publication venue
Publication date: 01/01/2014
Field of study

Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world datasets show the effectiveness of the proposed approach

Good approximate quantum LDPC codes from spacetime circuit Hamiltonians

Author: Bohdanowicz Thomas C.
Crosson Elizabeth
Nirkhe Chinmay
Yuen Henry
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2018
Field of study

We study approximate quantum low-density parity-check (QLDPC) codes, which are approximate quantum error-correcting codes specified as the ground space of a frustration-free local Hamiltonian, whose terms do not necessarily commute. Such codes generalize stabilizer QLDPC codes, which are exact quantum error-correcting codes with sparse, low-weight stabilizer generators (i.e. each stabilizer generator acts on a few qubits, and each qubit participates in a few stabilizer generators). Our investigation is motivated by an important question in Hamiltonian complexity and quantum coding theory: do stabilizer QLDPC codes with constant rate, linear distance, and constant-weight stabilizers exist? We show that obtaining such optimal scaling of parameters (modulo polylogarithmic corrections) is possible if we go beyond stabilizer codes: we prove the existence of a family of

[[N,k,d,\varepsilon]]

approximate QLDPC codes that encode

k = \widetilde{\Omega}(N)

logical qubits into

N

physical qubits with distance

d = \widetilde{\Omega}(N)

and approximation infidelity

\varepsilon = \mathcal{O}(1/\textrm{polylog}(N))

. The code space is stabilized by a set of 10-local noncommuting projectors, with each physical qubit only participating in

\mathcal{O}(\textrm{polylog} N)

projectors. We prove the existence of an efficient encoding map, and we show that arbitrary Pauli errors can be locally detected by circuits of polylogarithmic depth. Finally, we show that the spectral gap of the code Hamiltonian is

\widetilde{\Omega}(N^{-3.09})

by analyzing a spacetime circuit-to-Hamiltonian construction for a bitonic sorting network architecture that is spatially local in

\textrm{polylog}(N)

dimensions.Comment: 51 pages, 13 figure

arXiv.org e-Print Archive

Caltech Authors

Feedback Controlled Software Systems

Author: Dunbar William B.
Klavins Dr. Eric
Waydo Stephen
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2003
Field of study

Software systems generally suffer from a certain fragility in the face of disturbances such as bugs, unforeseen user input, unmodeled interactions with other software components, and so on. A single such disturbance can make the machine on which the software is executing hang or crash. We postulate that what is required to address this fragility is a general means of using feedback to stabilize these systems. In this paper we develop a preliminary dynamical systems model of an arbitrary iterative software process along with the conceptual framework for stabilizing it in the presence of disturbances. To keep the computational requirements of the controllers low, randomization and approximation are used. We describe our initial attempts to apply the model to a faulty list sorter, using feedback to improve its performance. Methods by which software robustness can be enhanced by distributing a task between nodes each of which are capable of selecting the best input to process are also examined, and the particular case of a sorting system consisting of a network of partial sorters, some of which may be buggy or even malicious, is examined

CiteSeerX

Caltech Authors

Learning a Complete Image Indexing Pipeline

Author: Delong Liu (31808)
Jean Feng (90943)
Jen-Wei Chiao (90947)
Marietta Y Lee (90946)
Quanyi Lu (90941)
Ruth Gallagher (90945)
Xianghua Lin (90942)
Xiangmin Zhao (90944)
Publication venue
Publication date: 12/12/2017
Field of study

To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on unsupervised clustering in the literature. In this work, we propose a first system that learns both components within a unifying neural framework of structured binary encoding

arXiv.org e-Print Archive

FigShare

Learning a Complete Image Indexing Pipeline

Author: Gribonval Rémi
Jain Himalaya
Pérez Patrick
Zepeda Joaquin
Publication venue
Publication date: 12/12/2017
Field of study

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

On Euclidean Norm Approximations

Author: Celebi M. Emre
Celiker Fatih
Kingravi Hassan A.
Publication venue
Publication date: 28/08/2010
Field of study

Euclidean norm calculations arise frequently in scientific and engineering applications. Several approximations for this norm with differing complexity and accuracy have been proposed in the literature. Earlier approaches were based on minimizing the maximum error. Recently, Seol and Cheun proposed an approximation based on minimizing the average error. In this paper, we first examine these approximations in detail, show that they fit into a single mathematical formulation, and compare their average and maximum errors. We then show that the maximum errors given by Seol and Cheun are significantly optimistic.Comment: 9 pages, 1 figure, Pattern Recognitio

arXiv.org e-Print Archive

CiteSeerX