973 research outputs found

    Technical report : SVM in Krein spaces

    Get PDF
    Support vector machines (SVM) and kernel methods have been highly successful in many application areas. However, the requirement that the kernel is symmetric positive semidefinite, Mercer's condition, is not always verifi ed in practice. When it is not, the kernel is called indefi nite. Various heuristics and specialized methods have been proposed to address indefi nite kernels, from simple tricks such as removing negative eigenvalues, to advanced methods that de-noise the kernel by considering the negative part of the kernel as noise. Most approaches aim at correcting an inde finite kernel in order to provide a positive one. We propose a new SVM approach that deals directly with inde finite kernels. In contrast to previous approaches, we embrace the underlying idea that the negative part of an inde finite kernel may contain valuable information. To de fine such a method, the SVM formulation has to be adapted to a non usual form: the stabilization. The hypothesis space, usually a Hilbert space, becomes a Krei n space. This work explores this new formulation, and proposes two practical algorithms (ESVM and KSVM) that outperform the approaches that modify the kernel. Moreover, the solution depends on the original kernel and thus can be used on any new point without loss of accurac

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciónJunta de Andalucí

    A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem

    Full text link
    In this paper, we consider the sparse eigenvalue problem wherein the goal is to obtain a sparse solution to the generalized eigenvalue problem. We achieve this by constraining the cardinality of the solution to the generalized eigenvalue problem and obtain sparse principal component analysis (PCA), sparse canonical correlation analysis (CCA) and sparse Fisher discriminant analysis (FDA) as special cases. Unlike the 1\ell_1-norm approximation to the cardinality constraint, which previous methods have used in the context of sparse PCA, we propose a tighter approximation that is related to the negative log-likelihood of a Student's t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of convex programs by invoking the majorization-minimization method. The resulting algorithm is proved to exhibit \emph{global convergence} behavior, i.e., for any random initialization, the sequence (subsequence) of iterates generated by the algorithm converges to a stationary point of the d.c. program. The performance of the algorithm is empirically demonstrated on both sparse PCA (finding few relevant genes that explain as much variance as possible in a high-dimensional gene dataset) and sparse CCA (cross-language document retrieval and vocabulary selection for music retrieval) applications.Comment: 40 page

    Supervised Classification and Mathematical Optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data

    Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions

    Full text link
    In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space Rd\mathbb{R}^d. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.Comment: 16 pages, 5 figures. Improved presentatio
    corecore