262 research outputs found

    A New Look at an Old Problem: A Universal Learning Approach to Linear Regression

    Full text link
    Linear regression is a classical paradigm in statistics. A new look at it is provided via the lens of universal learning. In applying universal learning to linear regression the hypotheses class represents the label y∈Ry\in {\cal R} as a linear combination of the feature vector xTθx^T\theta where x∈RMx\in {\cal R}^M, within a Gaussian error. The Predictive Normalized Maximum Likelihood (pNML) solution for universal learning of individual data can be expressed analytically in this case, as well as its associated learnability measure. Interestingly, the situation where the number of parameters MM may even be larger than the number of training samples NN can be examined. As expected, in this case learnability cannot be attained in every situation; nevertheless, if the test vector resides mostly in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, linear regression can generalize despite the fact that it uses an ``over-parametrized'' model. We demonstrate the results with a simulation of fitting a polynomial to data with a possibly large polynomial degree

    Clustering-based Source-aware Assessment of True Robustness for Learning Models

    Full text link
    We introduce a novel validation framework to measure the true robustness of learning models for real-world applications by creating source-inclusive and source-exclusive partitions in a dataset via clustering. We develop a robustness metric derived from source-aware lower and upper bounds of model accuracy even when data source labels are not readily available. We clearly demonstrate that even on a well-explored dataset like MNIST, challenging training scenarios can be constructed under the proposed assessment framework for two separate yet equally important applications: i) more rigorous learning model comparison and ii) dataset adequacy evaluation. In addition, our findings not only promise a more complete identification of trade-offs between model complexity, accuracy and robustness but can also help researchers optimize their efforts in data collection by identifying the less robust and more challenging class labels.Comment: Submitted to UAI 201

    Multiclass latent locally linear support vector machines

    Get PDF
    Kernelized Support Vector Machines (SVM) have gained the status of off-the-shelf classifiers, able to deliver state of the art performance on almost any problem. Still, their practical use is constrained by their computational and memory complexity, which grows super-linearly with the number of training samples. In order to retain the low training and testing complexity of linear classifiers and the exibility of non linear ones, a growing, promising alternative is represented by methods that learn non-linear classifiers through local combinations of linear ones. In this paper we propose a new multi class local classifier, based on a latent SVM formulation. The proposed classifier makes use of a set of linear models that are linearly combined using sample and class specific weights. Thanks to the latent formulation, the combination coefficients are modeled as latent variables. We allow soft combinations and we provide a closed-form solution for their estimation, resulting in an efficient prediction rule. This novel formulation allows to learn in a principled way the sample specific weights and the linear classifiers, in a unique optimization problem, using a CCCP optimization procedure. Extensive experiments on ten standard UCI machine learning datasets, one large binary dataset, three character and digit recognition databases, and a visual place categorization dataset show the power of the proposed approach

    Connections Between Adaptive Control and Optimization in Machine Learning

    Full text link
    This paper demonstrates many immediate connections between adaptive control and optimization methods commonly employed in machine learning. Starting from common output error formulations, similarities in update law modifications are examined. Concepts in stability, performance, and learning, common to both fields are then discussed. Building on the similarities in update laws and common concepts, new intersections and opportunities for improved algorithm analysis are provided. In particular, a specific problem related to higher order learning is solved through insights obtained from these intersections.Comment: 18 page

    Reliable credence and the foundations of statistics

    Get PDF
    If the goal of statistical analysis is to form justified credences based on data, then an account of the foundations of statistics should explain what makes credences justified. I present a new account called statistical reliabilism (SR), on which credences resulting from a statistical analysis are justified (relative to alternatives) when they are in a sense closest, on average, to the corresponding objective probabilities. This places (SR) in the same vein as recent work on the reliabilist justification of credences generally [Dunn, 2015, Tang, 2016, Pettigrew, 2018], but it has the advantage of being action-guiding in that knowledge of objective probabilities is not required to identify the best-justified available credences. The price is that justification is relativized to a specific class of candidate objective probabilities, and to a particular choice of reliability measure. On the other hand, I show that (SR) has welcome implications for frequentist-Bayesian reconciliation, including a clarification of the use of priors; complemen- tarity between probabilist and fallibilist [Gelman and Shalizi, 2013, Mayo, 2018] approaches towards statistical foundations; and the justification of credences outside of formal statistical settings. Regarding the latter, I demonstrate how the insights of statistics may be used to amend other reliabilist accounts so as to render them action-guiding. I close by discussing new possible research directions for epistemologists and statisticians (and other applied users of probability) raised by the (SR) framework

    Learning from networked examples

    Get PDF
    Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We show that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then consider alternatives. One of these is to only use independent examples, discarding other information. However, this is clearly suboptimal. We analyze sample error bounds in this networked setting, providing significantly improved results. An important component of our approach is formed by efficient sample weighting schemes, which leads to novel concentration inequalities
    • …
    corecore