1,142 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Rhythmic Representations: Learning Periodic Patterns for Scalable Place Recognition at a Sub-Linear Storage Cost
Robotic and animal mapping systems share many challenges and characteristics:
they must function in a wide variety of environmental conditions, enable the
robot or animal to navigate effectively to find food or shelter, and be
computationally tractable from both a speed and storage perspective. With
regards to map storage, the mammalian brain appears to take a diametrically
opposed approach to all current robotic mapping systems. Where robotic mapping
systems attempt to solve the data association problem to minimise
representational aliasing, neurons in the brain intentionally break data
association by encoding large (potentially unlimited) numbers of places with a
single neuron. In this paper, we propose a novel method based on supervised
learning techniques that seeks out regularly repeating visual patterns in the
environment with mutually complementary co-prime frequencies, and an encoding
scheme that enables storage requirements to grow sub-linearly with the size of
the environment being mapped. To improve robustness in challenging real-world
environments while maintaining storage growth sub-linearity, we incorporate
both multi-exemplar learning and data augmentation techniques. Using large
benchmark robotic mapping datasets, we demonstrate the combined system
achieving high-performance place recognition with sub-linear storage
requirements, and characterize the performance-storage growth trade-off curve.
The work serves as the first robotic mapping system with sub-linear storage
scaling properties, as well as the first large-scale demonstration in
real-world environments of one of the proposed memory benefits of these
neurons.Comment: Pre-print of article that will appear in the IEEE Robotics and
Automation Letter
Distributed query-aware quantization for high-dimensional similarity searches
The concept of similarity is used as the basis for many data exploration and data mining tasks. Nearest Neighbor (NN) queries identify the most similar items, or in terms of distance the closest points to a query point. Similarity is traditionally characterized using a distance function between multi-dimensional feature vectors. However, when the data is high-dimensional, traditional distance functions fail to significantly distinguish between the closest and furthest points, as few dissimilar dimensions dominate the distance function. Localized similarity functions, i.e. functions that only consider dimensions close to the query, quantize each dimension independently and only compute similarity for the dimensions where the query and the points fall into the same bin. These quantizations are query-agnostic. There is potential to improve accuracy when a query-dependent quantization is used. In this paper we propose a Query dependent Equi-Depth (QED) on-the-fly quantization method to improve high-dimensional similarity searches. The quantization is done for each dimension at query time and localized scores are generated for the closest p fraction of the points while a constant penalty is applied for the rest of the points. QED not only improves the quality of the distance metric, but also improves query time performance by filtering out non relevant data. We propose a distributed indexing and query algorithm to efficiently compute QED. Our experimental results show improvements in classification accuracy as well as query performance up to one order of magnitude faster than Manhattan-based sequential scan NN queries over datasets with hundreds of dimensions
Time for dithering: fast and quantized random embeddings via the restricted isometry property
Recently, many works have focused on the characterization of non-linear
dimensionality reduction methods obtained by quantizing linear embeddings,
e.g., to reach fast processing time, efficient data compression procedures,
novel geometry-preserving embeddings or to estimate the information/bits stored
in this reduced data representation. In this work, we prove that many linear
maps known to respect the restricted isometry property (RIP) can induce a
quantized random embedding with controllable multiplicative and additive
distortions with respect to the pairwise distances of the data points beings
considered. In other words, linear matrices having fast matrix-vector
multiplication algorithms (e.g., based on partial Fourier ensembles or on the
adjacency matrix of unbalanced expanders) can be readily used in the definition
of fast quantized embeddings with small distortions. This implication is made
possible by applying right after the linear map an additive and random "dither"
that stabilizes the impact of the uniform scalar quantization operator applied
afterwards. For different categories of RIP matrices, i.e., for different
linear embeddings of a metric space
in with , we derive upper bounds on the
additive distortion induced by quantization, showing that it decays either when
the embedding dimension increases or when the distance of a pair of
embedded vectors in decreases. Finally, we develop a novel
"bi-dithered" quantization scheme, which allows for a reduced distortion that
decreases when the embedding dimension grows and independently of the
considered pair of vectors.Comment: Keywords: random projections, non-linear embeddings, quantization,
dither, restricted isometry property, dimensionality reduction, compressive
sensing, low-complexity signal models, fast and structured sensing matrices,
quantized rank-one projections (31 pages
Aggressive saliency-aware point cloud compression
The increasing demand for accurate representations of 3D scenes, combined
with immersive technologies has led point clouds to extensive popularity.
However, quality point clouds require a large amount of data and therefore the
need for compression methods is imperative. In this paper, we present a novel,
geometry-based, end-to-end compression scheme, that combines information on the
geometrical features of the point cloud and the user's position, achieving
remarkable results for aggressive compression schemes demanding very small bit
rates. After separating visible and non-visible points, four saliency maps are
calculated, utilizing the point cloud's geometry and distance from the user,
the visibility information, and the user's focus point. A combination of these
maps results in a final saliency map, indicating the overall significance of
each point and therefore quantizing different regions with a different number
of bits during the encoding process. The decoder reconstructs the point cloud
making use of delta coordinates and solving a sparse linear system. Evaluation
studies and comparisons with the geometry-based point cloud compression (G-PCC)
algorithm by the Moving Picture Experts Group (MPEG), carried out for a variety
of point clouds, demonstrate that the proposed method achieves significantly
better results for small bit rates
Link and code: Fast indexing with graphs and compact regression codes
Similarity search approaches based on graph walks have recently attained
outstanding speed-accuracy trade-offs, taking aside the memory requirements. In
this paper, we revisit these approaches by considering, additionally, the
memory constraint required to index billions of images on a single server. This
leads us to propose a method based both on graph traversal and compact
representations. We encode the indexed vectors using quantization and exploit
the graph structure to refine the similarity estimation.
In essence, our method takes the best of these two worlds: the search
strategy is based on nested graphs, thereby providing high precision with a
relatively small set of comparisons. At the same time it offers a significant
memory compression. As a result, our approach outperforms the state of the art
on operating points considering 64-128 bytes per vector, as demonstrated by our
results on two billion-scale public benchmarks
Distributed Detection and Estimation in Wireless Sensor Networks
In this article we consider the problems of distributed detection and
estimation in wireless sensor networks. In the first part, we provide a general
framework aimed to show how an efficient design of a sensor network requires a
joint organization of in-network processing and communication. Then, we recall
the basic features of consensus algorithm, which is a basic tool to reach
globally optimal decisions through a distributed approach. The main part of the
paper starts addressing the distributed estimation problem. We show first an
entirely decentralized approach, where observations and estimations are
performed without the intervention of a fusion center. Then, we consider the
case where the estimation is performed at a fusion center, showing how to
allocate quantization bits and transmit powers in the links between the nodes
and the fusion center, in order to accommodate the requirement on the maximum
estimation variance, under a constraint on the global transmit power. We extend
the approach to the detection problem. Also in this case, we consider the
distributed approach, where every node can achieve a globally optimal decision,
and the case where the decision is taken at a central node. In the latter case,
we show how to allocate coding bits and transmit power in order to maximize the
detection probability, under constraints on the false alarm rate and the global
transmit power. Then, we generalize consensus algorithms illustrating a
distributed procedure that converges to the projection of the observation
vector onto a signal subspace. We then address the issue of energy consumption
in sensor networks, thus showing how to optimize the network topology in order
to minimize the energy necessary to achieve a global consensus. Finally, we
address the problem of matching the topology of the network to the graph
describing the statistical dependencies among the observed variables.Comment: 92 pages, 24 figures. To appear in E-Reference Signal Processing, R.
Chellapa and S. Theodoridis, Eds., Elsevier, 201
- …