34,132 research outputs found

    Feature Selection for Linear SVM with Provable Guarantees

    Full text link
    We give two provably accurate feature-selection techniques for the linear SVM. The algorithms run in deterministic and randomized time respectively. Our algorithms can be used in an unsupervised or supervised setting. The supervised approach is based on sampling features from support vectors. We prove that the margin in the feature space is preserved to within ϵ\epsilon-relative error of the margin in the full feature space in the worst-case. In the unsupervised setting, we also provide worst-case guarantees of the radius of the minimum enclosing ball, thereby ensuring comparable generalization as in the full feature space and resolving an open problem posed in Dasgupta et al. We present extensive experiments on real-world datasets to support our theory and to demonstrate that our method is competitive and often better than prior state-of-the-art, for which there are no known provable guarantees.Comment: Appearing in Proceedings of 18th AISTATS, JMLR W&CP, vol 38, 201

    Support Vector Machines in Analysis of Top Quark Production

    Full text link
    Multivariate data analysis techniques have the potential to improve physics analyses in many ways. The common classification problem of signal/background discrimination is one example. The Support Vector Machine learning algorithm is a relatively new way to solve pattern recognition problems and has several advantages over methods such as neural networks. The SVM approach is described and compared to a conventional analysis for the case of identifying top quark signal events in the dilepton decay channel amidst a large number of background events.Comment: 8 pages, 8 figures, to be published in the proceedings of the "Advanced Statistical Techniques in Particle Physics" conference in Durham, UK (March, 2002

    Efficient Classification for Metric Data

    Full text link
    Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in Proceedings of the 23rd COLT, 201

    Maximum Margin Clustering for State Decomposition of Metastable Systems

    Full text link
    When studying a metastable dynamical system, a prime concern is how to decompose the phase space into a set of metastable states. Unfortunately, the metastable state decomposition based on simulation or experimental data is still a challenge. The most popular and simplest approach is geometric clustering which is developed based on the classical clustering technique. However, the prerequisites of this approach are: (1) data are obtained from simulations or experiments which are in global equilibrium and (2) the coordinate system is appropriately selected. Recently, the kinetic clustering approach based on phase space discretization and transition probability estimation has drawn much attention due to its applicability to more general cases, but the choice of discretization policy is a difficult task. In this paper, a new decomposition method designated as maximum margin metastable clustering is proposed, which converts the problem of metastable state decomposition to a semi-supervised learning problem so that the large margin technique can be utilized to search for the optimal decomposition without phase space discretization. Moreover, several simulation examples are given to illustrate the effectiveness of the proposed method
    corecore