950 research outputs found

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Linear Dimensionality Reduction for Margin-Based Classification: High-Dimensional Data and Sensor Networks

    Get PDF
    Low-dimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning low-dimensional linear statistics of high-dimensional measurement data along with decision rules defined in the low-dimensional space in the case when the probability density of the measurements and class labels is not given, but a training set of samples from this distribution is given. We pose a joint optimization problem for linear dimensionality reduction and margin-based classification, and develop a coordinate descent algorithm on the Stiefel manifold for its solution. Although the coordinate descent is not guaranteed to find the globally optimal solution, crucially, its alternating structure enables us to extend it for sensor networks with a message-passing approach requiring little communication. Linear dimensionality reduction prevents overfitting when learning from finite training data. In the sensor network setting, dimensionality reduction not only prevents overfitting, but also reduces power consumption due to communication. The learned reduced-dimensional space and decision rule is shown to be consistent and its Rademacher complexity is characterized. Experimental results are presented for a variety of datasets, including those from existing sensor networks, demonstrating the potential of our methodology in comparison with other dimensionality reduction approaches.National Science Foundation (U.S.). Graduate Research Fellowship ProgramUnited States. Army Research Office (MURI funded through ARO Grant W911NF-06-1-0076)United States. Air Force Office of Scientific Research (Award FA9550-06-1-0324)Shell International Exploration and Production B.V

    Hierarchical Quadratic Random Forest Classifier

    Full text link
    In this paper, we proposed a hierarchical quadratic random forest classifier for classifying multiresolution samples extracted from multichannel data. This forest incorporated a penalized multivariate linear discriminant in each of its decision nodes and processed squared features to realize quadratic decision boundaries in the original feature space. The penalized discriminant was based on a multiclass sparse discriminant analysis and the penalization was based on a group Lasso regularizer which was an intermediate between the Lasso and the ridge regularizer. The classification probabilities estimated by this forest and the features learned by its decision nodes could be used standalone or foster graph-based classifiers

    Leaf Venation Networks

    Get PDF
    corecore