1,263 research outputs found

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    S-TREE: Self-Organizing Trees for Data Clustering and Online Vector Quantization

    Full text link
    This paper introduces S-TREE (Self-Organizing Tree), a family of models that use unsupervised learning to construct hierarchical representations of data and online tree-structured vector quantizers. The S-TREE1 model, which features a new tree-building algorithm, can be implemented with various cost functions. An alternative implementation, S-TREE2, which uses a new double-path search procedure, is also developed. S-TREE2 implements an online procedure that approximates an optimal (unstructured) clustering solution while imposing a tree-structure constraint. The performance of the S-TREE algorithms is illustrated with data clustering and vector quantization examples, including a Gauss-Markov source benchmark and an image compression application. S-TREE performance on these tasks is compared with the standard tree-structured vector quantizer (TSVQ) and the generalized Lloyd algorithm (GLA). The image reconstruction quality with S-TREE2 approaches that of GLA while taking less than 10% of computer time. S-TREE1 and S-TREE2 also compare favorably with the standard TSVQ in both the time needed to create the codebook and the quality of image reconstruction.Office of Naval Research (N00014-95-10409, N00014-95-0G57

    Dimension reduction of image and audio space

    Full text link
    The reduction of data necessary for storage or transmission is a desirable goal in the digital video and audio domain. Compression schemes strive to reduce the amount of storage space or bandwidth necessary to keep or move the data. Data reduction can be accomplished so that visually or audibly unnecessary data is removed or recoded thus aiding the compression phase of the data processing. The characterization and identification of data that can be successfully removed or reduced is the purpose of this work. New philosophy, theory and methods for data processing are presented towards the goal of data reduction. The philosophy and theory developed in this work establish a foundation for high speed data reduction suitable for multi-media applications. The developed methods encompass motion detection and edge detection as features of the systems. The philosophy of energy flow analysis in video processing enables the consideration of noise in digital video data. Research into noise versus motion leads to an efficient and successful method of identifying motion in a sequence. The research of the underlying statistical properties of vector quantization provides an insight into the performance characteristics of vector quantization and leads to successful improvements in application. The underlying statistical properties of the vector quantization process are analyzed and three theorems are developed and proved. The theorems establish the statistical distributions and probability densities of various metrics of the vector quantization process. From these properties, an intelligent and efficient algorithm design is developed and tested. The performance improvements in both time and quality are established through algorithm analysis and empirical testing. The empirical results are presented

    Speech Recognition Using Vector Quantization through Modified K-meansLBG Algorithm

    Get PDF
    In the Vector Quantization, the main task is to generate a good codebook. The distortion measure between the original pattern and the reconstructed pattern should be minimum. In this paper, a proposed algorithm called Modified K-meansLBG algorithm used to obtain a good codebook. The system has shown good performance on limited vocabulary tasks. Keywords: K-means algorithm, LBG algorithm, Vector Quantization, Speech Recognitio

    Optimal Image Reconstruction in Radio Interferometry

    Full text link
    We introduce a method for analyzing radio interferometry data which produces maps which are optimal in the Bayesian sense of maximum posterior probability density, given certain prior assumptions. It is similar to maximum entropy techniques, but with an exact accounting of the multiplicity instead of the usual approximation involving Stirling's formula. It also incorporates an Occam factor, automatically limiting the effective amount of detail in the map to that justified by the data. We use Gibbs sampling to determine, to any desired degree of accuracy, the multi-dimensional posterior density distribution. From this we can construct a mean posterior map and other measures of the posterior density, including confidence limits on any well-defined function of the posterior map.Comment: 41 pages, 11 figures. High resolution figures 8 and 9 available at http://www.astro.uiuc.edu/~bwandelt/SuttonWandelt200
    • …
    corecore