34,132 research outputs found
Feature Selection for Linear SVM with Provable Guarantees
We give two provably accurate feature-selection techniques for the linear
SVM. The algorithms run in deterministic and randomized time respectively. Our
algorithms can be used in an unsupervised or supervised setting. The supervised
approach is based on sampling features from support vectors. We prove that the
margin in the feature space is preserved to within -relative error of
the margin in the full feature space in the worst-case. In the unsupervised
setting, we also provide worst-case guarantees of the radius of the minimum
enclosing ball, thereby ensuring comparable generalization as in the full
feature space and resolving an open problem posed in Dasgupta et al. We present
extensive experiments on real-world datasets to support our theory and to
demonstrate that our method is competitive and often better than prior
state-of-the-art, for which there are no known provable guarantees.Comment: Appearing in Proceedings of 18th AISTATS, JMLR W&CP, vol 38, 201
Support Vector Machines in Analysis of Top Quark Production
Multivariate data analysis techniques have the potential to improve physics
analyses in many ways. The common classification problem of signal/background
discrimination is one example. The Support Vector Machine learning algorithm is
a relatively new way to solve pattern recognition problems and has several
advantages over methods such as neural networks. The SVM approach is described
and compared to a conventional analysis for the case of identifying top quark
signal events in the dilepton decay channel amidst a large number of background
events.Comment: 8 pages, 8 figures, to be published in the proceedings of the
"Advanced Statistical Techniques in Particle Physics" conference in Durham,
UK (March, 2002
Efficient Classification for Metric Data
Recent advances in large-margin classification of data residing in general
metric spaces (rather than Hilbert spaces) enable classification under various
natural metrics, such as string edit and earthmover distance. A general
framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004]
left open the questions of computational efficiency and of providing direct
bounds on generalization error.
We design a new algorithm for classification in general metric spaces, whose
runtime and accuracy depend on the doubling dimension of the data points, and
can thus achieve superior classification performance in many common scenarios.
The algorithmic core of our approach is an approximate (rather than exact)
solution to the classical problems of Lipschitz extension and of Nearest
Neighbor Search. The algorithm's generalization performance is guaranteed via
the fat-shattering dimension of Lipschitz classifiers, and we present
experimental evidence of its superiority to some common kernel methods. As a
by-product, we offer a new perspective on the nearest neighbor classifier,
which yields significantly sharper risk asymptotics than the classic analysis
of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in
Proceedings of the 23rd COLT, 201
Maximum Margin Clustering for State Decomposition of Metastable Systems
When studying a metastable dynamical system, a prime concern is how to
decompose the phase space into a set of metastable states. Unfortunately, the
metastable state decomposition based on simulation or experimental data is
still a challenge. The most popular and simplest approach is geometric
clustering which is developed based on the classical clustering technique.
However, the prerequisites of this approach are: (1) data are obtained from
simulations or experiments which are in global equilibrium and (2) the
coordinate system is appropriately selected. Recently, the kinetic clustering
approach based on phase space discretization and transition probability
estimation has drawn much attention due to its applicability to more general
cases, but the choice of discretization policy is a difficult task. In this
paper, a new decomposition method designated as maximum margin metastable
clustering is proposed, which converts the problem of metastable state
decomposition to a semi-supervised learning problem so that the large margin
technique can be utilized to search for the optimal decomposition without phase
space discretization. Moreover, several simulation examples are given to
illustrate the effectiveness of the proposed method
- …