2,807 research outputs found

    Sharp analysis of low-rank kernel matrix approximations

    Get PDF
    We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations

    Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation

    Full text link
    Volterra and polynomial regression models play a major role in nonlinear system identification and inference tasks. Exciting applications ranging from neuroscience to genome-wide association analysis build on these models with the additional requirement of parsimony. This requirement has high interpretative value, but unfortunately cannot be met by least-squares based or kernel regression methods. To this end, compressed sampling (CS) approaches, already successful in linear regression settings, can offer a viable alternative. The viability of CS for sparse Volterra and polynomial models is the core theme of this work. A common sparse regression task is initially posed for the two models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type algorithm is developed for sparse polynomial regressions. The identifiability of polynomial models is critically challenged by dimensionality. However, following the CS principle, when these models are sparse, they could be recovered by far fewer measurements. To quantify the sufficient number of measurements for a given level of sparsity, restricted isometry properties (RIP) are investigated in commonly met polynomial regression settings, generalizing known results for their linear counterparts. The merits of the novel (weighted) adaptive CS algorithms to sparse polynomial modeling are verified through synthetic as well as real data tests for genotype-phenotype analysis.Comment: 20 pages, to appear in IEEE Trans. on Signal Processin

    A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

    Full text link
    Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.Comment: 18 pages, 2 figures 3 table

    A Comparative Study between Fixed-size Kernel Logistic Regression and Support Vector Machines Methods for beta-turns Prediction in Protein

    Get PDF
    Beta-turn is an important element of protein structure; it plays a significant role in protein configuration and function. There are several methods developed for prediction of beta-turns from protein sequences. The best methods are based on Neural Networks (NNs) or Support Vector Machines (SVMs). Although Kernel Logistic Regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems, however it is often not found in beta-turns classification, mainly because it is computationally expensive. Fixed-Size Kernel Logistic Regression (FS-KLR) is a fast and accurate approximate implementation of KLR for large-scale data sets. It uses trust-region Newton’s method for large-scale Logistic Regression (LR) as a basis, to solve the approximate problem, and Nystrom method to approximate the features' matrix. In this paper we used FS-KLR for beta-turns prediction and the results obtained are compared to those obtained with SVM. Secondary structure information and Position Specific Scoring Matrices (PSSMs) are utilized as input features. The performance achieved using FS-KLR is found to be comparable to that of SVM method. FS-KLR has an advantage of yielding probabilistic outputs directly and its extension to the multi-class case is well-defined. In addition its evaluation time is less than that of SVM method. Beta-turn is an important element of protein structure; it plays a significant role in protein configuration and function. There are several methods developed for prediction of beta-turns from protein sequences. The best methods are based on Neural Networks (NNs) or Support Vector Machines (SVMs). Although Kernel Logistic Regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems, however it is often not found in beta-turns classification, mainly because it is computationally expensive. Fixed-Size Kernel Logistic Regression (FS-KLR) is a fast and accurate approximate implementation of KLR for large-scale data sets. It uses trust-region Newton’s method for large-scale Logistic Regression (LR) as a basis, to solve the approximate problem, and Nystrom method to approximate the features' matrix. In this paper we used FS-KLR for beta-turns prediction and the results obtained are compared to those obtained with SVM. Secondary structure information and Position Specific Scoring Matrices (PSSMs) are utilized as input features. The performance achieved using FS-KLR is found to be comparable to that of SVM method. FS-KLR has an advantage of yielding probabilistic outputs directly and its extension to the multi-class case is well-defined. In addition its evaluation time is less than that of SVM method

    A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

    Full text link
    Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error ϵ\epsilon and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an ϵ\epsilon-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

    A Fused Elastic Net Logistic Regression Model for Multi-Task Binary Classification

    Full text link
    Multi-task learning has shown to significantly enhance the performance of multiple related learning tasks in a variety of situations. We present the fused logistic regression, a sparse multi-task learning approach for binary classification. Specifically, we introduce sparsity inducing penalties over parameter differences of related logistic regression models to encode similarity across related tasks. The resulting joint learning task is cast into a form that lends itself to be efficiently optimized with a recursive variant of the alternating direction method of multipliers. We show results on synthetic data and describe the regime of settings where our multi-task approach achieves significant improvements over the single task learning approach and discuss the implications on applying the fused logistic regression in different real world settings.Comment: 17 page
    • …
    corecore