3,735 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Review of the mathematical foundations of data fusion techniques in surface metrology
The recent proliferation of engineered surfaces, including freeform and structured surfaces, is challenging current metrology techniques. Measurement using multiple sensors has been proposed to achieve enhanced benefits, mainly in terms of spatial frequency bandwidth, which a single sensor cannot provide. When using data from different sensors, a process of data fusion is required and there is much active research in this area. In this paper, current data fusion methods and applications are reviewed, with a focus on the mathematical foundations of the subject. Common research questions in the fusion of surface metrology data are raised and potential fusion algorithms are discussed
Correcting curvature-density effects in the Hamilton-Jacobi skeleton
The Hainilton-Jacobi approach has proven to be a powerful and elegant method for extracting the skeleton of two-dimensional (2-D) shapes. The approach is based on the observation that the normalized flux associated with the inward evolution of the object boundary at nonskeletal points tends to zero as the size of the integration area tends to zero, while the flux is negative at the locations of skeletal points. Nonetheless, the error in calculating the flux on the image lattice is both limited by the pixel resolution and also proportional to the curvature of the boundary evolution front and, hence, unbounded near endpoints. This makes the exact location of endpoints difficult and renders the performance of the skeleton extraction algorithm dependent on a threshold parameter. This problem can be overcome by using interpolation techniques to calculate the flux with subpixel precision. However, here, we develop a method for 2-D skeleton extraction that circumvents the problem by eliminating the curvature contribution to the error. This is done by taking into account variations of density due to boundary curvature. This yields a skeletonization algorithm that gives both better localization and less susceptibility to boundary noise and parameter choice than the Hamilton-Jacobi method
Semantic distillation: a method for clustering objects by their contextual specificity
Techniques for data-mining, latent semantic analysis, contextual search of
databases, etc. have long ago been developed by computer scientists working on
information retrieval (IR). Experimental scientists, from all disciplines,
having to analyse large collections of raw experimental data (astronomical,
physical, biological, etc.) have developed powerful methods for their
statistical analysis and for clustering, categorising, and classifying objects.
Finally, physicists have developed a theory of quantum measurement, unifying
the logical, algebraic, and probabilistic aspects of queries into a single
formalism. The purpose of this paper is twofold: first to show that when
formulated at an abstract level, problems from IR, from statistical data
analysis, and from physical measurement theories are very similar and hence can
profitably be cross-fertilised, and, secondly, to propose a novel method of
fuzzy hierarchical clustering, termed \textit{semantic distillation} --
strongly inspired from the theory of quantum measurement --, we developed to
analyse raw data coming from various types of experiments on DNA arrays. We
illustrate the method by analysing DNA arrays experiments and clustering the
genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence,
Springer-Verla
k-Nearest Neighbor Search for Large Scale High Dimensional data
Tohoku University徳山
- …