8,072 research outputs found
Constellation Queries over Big Data
A geometrical pattern is a set of points with all pairwise distances (or,
more generally, relative distances) specified. Finding matches to such patterns
has applications to spatial data in seismic, astronomical, and transportation
contexts. For example, a particularly interesting geometric pattern in
astronomy is the Einstein cross, which is an astronomical phenomenon in which a
single quasar is observed as four distinct sky objects (due to gravitational
lensing) when captured by earth telescopes. Finding such crosses, as well as
other geometric patterns, is a challenging problem as the potential number of
sets of elements that compose shapes is exponentially large in the size of the
dataset and the pattern. In this paper, we denote geometric patterns as
constellation queries and propose algorithms to find them in large data
applications. Our methods combine quadtrees, matrix multiplication, and
unindexed join processing to discover sets of points that match a geometric
pattern within some additive factor on the pairwise distances. Our distributed
experiments show that the choice of composition algorithm (matrix
multiplication or nested loops) depends on the freedom introduced in the query
geometry through the distance additive factor. Three clearly identified blocks
of threshold values guide the choice of the best composition algorithm.
Finally, solving the problem for relative distances requires a novel
continuous-to-discrete transformation. To the best of our knowledge this paper
is the first to investigate constellation queries at scale
Hallucinating optimal high-dimensional subspaces
Linear subspace representations of appearance variation are pervasive in
computer vision. This paper addresses the problem of robustly matching such
subspaces (computing the similarity between them) when they are used to
describe the scope of variations within sets of images of different (possibly
greatly so) scales. A naive solution of projecting the low-scale subspace into
the high-scale image space is described first and subsequently shown to be
inadequate, especially at large scale discrepancies. A successful approach is
proposed instead. It consists of (i) an interpolated projection of the
low-scale subspace into the high-scale space, which is followed by (ii) a
rotation of this initial estimate within the bounds of the imposed
``downsampling constraint''. The optimal rotation is found in the closed-form
which best aligns the high-scale reconstruction of the low-scale subspace with
the reference it is compared to. The method is evaluated on the problem of
matching sets of (i) face appearances under varying illumination and (ii)
object appearances under varying viewpoint, using two large data sets. In
comparison to the naive matching, the proposed algorithm is shown to greatly
increase the separation of between-class and within-class similarities, as well
as produce far more meaningful modes of common appearance on which the match
score is based.Comment: Pattern Recognition, 201
JGraphT -- A Java library for graph data structures and algorithms
Mathematical software and graph-theoretical algorithmic packages to
efficiently model, analyze and query graphs are crucial in an era where
large-scale spatial, societal and economic network data are abundantly
available. One such package is JGraphT, a programming library which contains
very efficient and generic graph data-structures along with a large collection
of state-of-the-art algorithms. The library is written in Java with stability,
interoperability and performance in mind. A distinctive feature of this library
is the ability to model vertices and edges as arbitrary objects, thereby
permitting natural representations of many common networks including
transportation, social and biological networks. Besides classic graph
algorithms such as shortest-paths and spanning-tree algorithms, the library
contains numerous advanced algorithms: graph and subgraph isomorphism; matching
and flow problems; approximation algorithms for NP-hard problems such as
independent set and TSP; and several more exotic algorithms such as Berge graph
detection. Due to its versatility and generic design, JGraphT is currently used
in large-scale commercial, non-commercial and academic research projects. In
this work we describe in detail the design and underlying structure of the
library, and discuss its most important features and algorithms. A
computational study is conducted to evaluate the performance of JGraphT versus
a number of similar libraries. Experiments on a large number of graphs over a
variety of popular algorithms show that JGraphT is highly competitive with
other established libraries such as NetworkX or the BGL.Comment: Major Revisio
Neural Distributed Autoassociative Memories: A Survey
Introduction. Neural network models of autoassociative, distributed memory
allow storage and retrieval of many items (vectors) where the number of stored
items can exceed the vector dimension (the number of neurons in the network).
This opens the possibility of a sublinear time search (in the number of stored
items) for approximate nearest neighbors among vectors of high dimension. The
purpose of this paper is to review models of autoassociative, distributed
memory that can be naturally implemented by neural networks (mainly with local
learning rules and iterative dynamics based on information locally available to
neurons). Scope. The survey is focused mainly on the networks of Hopfield,
Willshaw and Potts, that have connections between pairs of neurons and operate
on sparse binary vectors. We discuss not only autoassociative memory, but also
the generalization properties of these networks. We also consider neural
networks with higher-order connections and networks with a bipartite graph
structure for non-binary data with linear constraints. Conclusions. In
conclusion we discuss the relations to similarity search, advantages and
drawbacks of these techniques, and topics for further research. An interesting
and still not completely resolved question is whether neural autoassociative
memories can search for approximate nearest neighbors faster than other index
structures for similarity search, in particular for the case of very high
dimensional vectors.Comment: 31 page
- …