10,367 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Video data compression using artificial neural network differential vector quantization
An artificial neural network vector quantizer is developed for use in data compression applications such as Digital Video. Differential Vector Quantization is used to preserve edge features, and a new adaptive algorithm, known as Frequency-Sensitive Competitive Learning, is used to develop the vector quantizer codebook. To develop real time performance, a custom Very Large Scale Integration Application Specific Integrated Circuit (VLSI ASIC) is being developed to realize the associative memory functions needed in the vector quantization algorithm. By using vector quantization, the need for Huffman coding can be eliminated, resulting in superior performance against channel bit errors than methods that use variable length codes
Fast training of self organizing maps for the visual exploration of molecular compounds
Visual exploration of scientific data in life science
area is a growing research field due to the large amount of
available data. The Kohonen’s Self Organizing Map (SOM) is
a widely used tool for visualization of multidimensional data.
In this paper we present a fast learning algorithm for SOMs
that uses a simulated annealing method to adapt the learning
parameters. The algorithm has been adopted in a data analysis
framework for the generation of similarity maps. Such maps
provide an effective tool for the visual exploration of large and
multi-dimensional input spaces. The approach has been applied
to data generated during the High Throughput Screening
of molecular compounds; the generated maps allow a visual
exploration of molecules with similar topological properties.
The experimental analysis on real world data from the
National Cancer Institute shows the speed up of the proposed
SOM training process in comparison to a traditional approach.
The resulting visual landscape groups molecules with similar
chemical properties in densely connected regions
S-TREE: Self-Organizing Trees for Data Clustering and Online Vector Quantization
This paper introduces S-TREE (Self-Organizing Tree), a family of models that use unsupervised learning to construct hierarchical representations of data and online tree-structured vector quantizers. The S-TREE1 model, which features a new tree-building algorithm, can be implemented with various cost functions. An alternative implementation, S-TREE2, which uses a new double-path search procedure, is also developed. S-TREE2 implements an online procedure that approximates an optimal (unstructured) clustering solution while imposing a tree-structure constraint. The performance of the S-TREE algorithms is illustrated with data clustering and vector quantization examples, including a Gauss-Markov source benchmark and an image compression application. S-TREE performance on these tasks is compared with the standard tree-structured vector quantizer (TSVQ) and the generalized Lloyd algorithm (GLA). The image reconstruction quality with S-TREE2 approaches that of GLA while taking less than 10% of computer time. S-TREE1 and S-TREE2 also compare favorably with the standard TSVQ in both the time needed to create the codebook and the quality of image reconstruction.Office of Naval Research (N00014-95-10409, N00014-95-0G57
- …