34,730 research outputs found
Incremental dimension reduction of tensors with random index
We present an incremental, scalable and efficient dimension reduction
technique for tensors that is based on sparse random linear coding. Data is
stored in a compactified representation with fixed size, which makes memory
requirements low and predictable. Component encoding and decoding are performed
on-line without computationally expensive re-analysis of the data set. The
range of tensor indices can be extended dynamically without modifying the
component representation. This idea originates from a mathematical model of
semantic memory and a method known as random indexing in natural language
processing. We generalize the random-indexing algorithm to tensors and present
signal-to-noise-ratio simulations for representations of vectors and matrices.
We present also a mathematical analysis of the approximate orthogonality of
high-dimensional ternary vectors, which is a property that underpins this and
other similar random-coding approaches to dimension reduction. To further
demonstrate the properties of random indexing we present results of a synonym
identification task. The method presented here has some similarities with
random projection and Tucker decomposition, but it performs well at high
dimensionality only (n>10^3). Random indexing is useful for a range of complex
practical problems, e.g., in natural language processing, data mining, pattern
recognition, event detection, graph searching and search engines. Prototype
software is provided. It supports encoding and decoding of tensors of order >=
1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure
Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos
- …