950 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Linear Dimensionality Reduction for Margin-Based Classification: High-Dimensional Data and Sensor Networks
Low-dimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning low-dimensional linear statistics of high-dimensional measurement data along with decision rules defined in the low-dimensional space in the case when the probability density of the measurements and class labels is not given, but a training set of samples from this distribution is given. We pose a joint optimization problem for linear dimensionality reduction and margin-based classification, and develop a coordinate descent algorithm on the Stiefel manifold for its solution. Although the coordinate descent is not guaranteed to find the globally optimal solution, crucially, its alternating structure enables us to extend it for sensor networks with a message-passing approach requiring little communication. Linear dimensionality reduction prevents overfitting when learning from finite training data. In the sensor network setting, dimensionality reduction not only prevents overfitting, but also reduces power consumption due to communication. The learned reduced-dimensional space and decision rule is shown to be consistent and its Rademacher complexity is characterized. Experimental results are presented for a variety of datasets, including those from existing sensor networks, demonstrating the potential of our methodology in comparison with other dimensionality reduction approaches.National Science Foundation (U.S.). Graduate Research Fellowship ProgramUnited States. Army Research Office (MURI funded through ARO Grant W911NF-06-1-0076)United States. Air Force Office of Scientific Research (Award FA9550-06-1-0324)Shell International Exploration and Production B.V
Hierarchical Quadratic Random Forest Classifier
In this paper, we proposed a hierarchical quadratic random forest classifier
for classifying multiresolution samples extracted from multichannel data. This
forest incorporated a penalized multivariate linear discriminant in each of its
decision nodes and processed squared features to realize quadratic decision
boundaries in the original feature space. The penalized discriminant was based
on a multiclass sparse discriminant analysis and the penalization was based on
a group Lasso regularizer which was an intermediate between the Lasso and the
ridge regularizer. The classification probabilities estimated by this forest
and the features learned by its decision nodes could be used standalone or
foster graph-based classifiers
- …