5,666 research outputs found
Supervised Classification Using Sparse Fisher's LDA
It is well known that in a supervised classification setting when the number
of features is smaller than the number of observations, Fisher's linear
discriminant rule is asymptotically Bayes. However, there are numerous modern
applications where classification is needed in the high-dimensional setting.
Naive implementation of Fisher's rule in this case fails to provide good
results because the sample covariance matrix is singular. Moreover, by
constructing a classifier that relies on all features the interpretation of the
results is challenging. Our goal is to provide robust classification that
relies only on a small subset of important features and accounts for the
underlying correlation structure. We apply a lasso-type penalty to the
discriminant vector to ensure sparsity of the solution and use a shrinkage type
estimator for the covariance matrix. The resulting optimization problem is
solved using an iterative coordinate ascent algorithm. Furthermore, we analyze
the effect of nonconvexity on the sparsity level of the solution and highlight
the difference between the penalized and the constrained versions of the
problem. The simulation results show that the proposed method performs
favorably in comparison to alternatives. The method is used to classify
leukemia patients based on DNA methylation features
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
One-bit compressed sensing by linear programming
We give the first computationally tractable and almost optimal solution to
the problem of one-bit compressed sensing, showing how to accurately recover an
s-sparse vector x in R^n from the signs of O(s log^2(n/s)) random linear
measurements of x. The recovery is achieved by a simple linear program. This
result extends to approximately sparse vectors x. Our result is universal in
the sense that with high probability, one measurement scheme will successfully
recover all sparse vectors simultaneously. The argument is based on solving an
equivalent geometric problem on random hyperplane tessellations.Comment: 15 pages, 1 figure, to appear in CPAM. Small changes based on referee
comment
- …