61,495 research outputs found
Noise-tolerant approximate blocking for dynamic real-time entity resolution
Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate
blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world
datasets show the effectiveness of the proposed approach
Good approximate quantum LDPC codes from spacetime circuit Hamiltonians
We study approximate quantum low-density parity-check (QLDPC) codes, which
are approximate quantum error-correcting codes specified as the ground space of
a frustration-free local Hamiltonian, whose terms do not necessarily commute.
Such codes generalize stabilizer QLDPC codes, which are exact quantum
error-correcting codes with sparse, low-weight stabilizer generators (i.e. each
stabilizer generator acts on a few qubits, and each qubit participates in a few
stabilizer generators). Our investigation is motivated by an important question
in Hamiltonian complexity and quantum coding theory: do stabilizer QLDPC codes
with constant rate, linear distance, and constant-weight stabilizers exist?
We show that obtaining such optimal scaling of parameters (modulo
polylogarithmic corrections) is possible if we go beyond stabilizer codes: we
prove the existence of a family of approximate QLDPC
codes that encode logical qubits into physical
qubits with distance and approximation infidelity
. The code space is
stabilized by a set of 10-local noncommuting projectors, with each physical
qubit only participating in projectors. We
prove the existence of an efficient encoding map, and we show that arbitrary
Pauli errors can be locally detected by circuits of polylogarithmic depth.
Finally, we show that the spectral gap of the code Hamiltonian is
by analyzing a spacetime circuit-to-Hamiltonian
construction for a bitonic sorting network architecture that is spatially local
in dimensions.Comment: 51 pages, 13 figure
Feedback Controlled Software Systems
Software systems generally suffer from a certain fragility in the face of disturbances such as bugs, unforeseen user input, unmodeled interactions with other software components, and so on. A single such disturbance can make the machine on which the software is executing hang or crash. We postulate that what is required to address this fragility is a general means of using feedback to stabilize these systems. In this paper we develop a preliminary dynamical systems model of an arbitrary iterative software process along with the conceptual framework for stabilizing it in the presence of disturbances. To keep the computational requirements of the controllers low, randomization and approximation are used. We describe our initial attempts to apply the model to a faulty list sorter, using feedback to improve its performance. Methods by which software robustness can be enhanced by distributing a task between nodes each of which are capable of selecting the best input to process are also examined, and the particular case of a sorting system consisting of a network of partial sorters, some of which may be buggy or even malicious, is examined
Learning a Complete Image Indexing Pipeline
To work at scale, a complete image indexing system comprises two components:
An inverted file index to restrict the actual search to only a subset that
should contain most of the items relevant to the query; An approximate distance
computation mechanism to rapidly scan these lists. While supervised deep
learning has recently enabled improvements to the latter, the former continues
to be based on unsupervised clustering in the literature. In this work, we
propose a first system that learns both components within a unifying neural
framework of structured binary encoding
Learning a Complete Image Indexing Pipeline
To work at scale, a complete image indexing system comprises two components:
An inverted file index to restrict the actual search to only a subset that
should contain most of the items relevant to the query; An approximate distance
computation mechanism to rapidly scan these lists. While supervised deep
learning has recently enabled improvements to the latter, the former continues
to be based on unsupervised clustering in the literature. In this work, we
propose a first system that learns both components within a unifying neural
framework of structured binary encoding
On Euclidean Norm Approximations
Euclidean norm calculations arise frequently in scientific and engineering
applications. Several approximations for this norm with differing complexity
and accuracy have been proposed in the literature. Earlier approaches were
based on minimizing the maximum error. Recently, Seol and Cheun proposed an
approximation based on minimizing the average error. In this paper, we first
examine these approximations in detail, show that they fit into a single
mathematical formulation, and compare their average and maximum errors. We then
show that the maximum errors given by Seol and Cheun are significantly
optimistic.Comment: 9 pages, 1 figure, Pattern Recognitio
- …