495,673 research outputs found

    Breiman's "Two Cultures" Revisited and Reconciled

    Full text link
    In a landmark paper published in 2001, Leo Breiman described the tense standoff between two cultures of data modeling: parametric statistical and algorithmic machine learning. The cultural division between these two statistical learning frameworks has been growing at a steady pace in recent years. What is the way forward? It has become blatantly obvious that this widening gap between "the two cultures" cannot be averted unless we find a way to blend them into a coherent whole. This article presents a solution by establishing a link between the two cultures. Through examples, we describe the challenges and potential gains of this new integrated statistical thinking.Comment: This paper celebrates the 70th anniversary of Statistical Machine Learning--- how far we've come, and how far we have to go. Keywords: Integrated statistical learning theory, Exploratory machine learning, Uncertainty prediction machine, ML-powered modern applied statistics, Information theor

    Optimizing for Generalization in Machine Learning with Cross-Validation Gradients

    Full text link
    Cross-validation is the workhorse of modern applied statistics and machine learning, as it provides a principled framework for selecting the model that maximizes generalization performance. In this paper, we show that the cross-validation risk is differentiable with respect to the hyperparameters and training data for many common machine learning algorithms, including logistic regression, elastic-net regression, and support vector machines. Leveraging this property of differentiability, we propose a cross-validation gradient method (CVGM) for hyperparameter optimization. Our method enables efficient optimization in high-dimensional hyperparameter spaces of the cross-validation risk, the best surrogate of the true generalization ability of our learning algorithm.Comment: 11 page

    Minimal Achievable Sufficient Statistic Learning

    Get PDF
    We introduce Minimal Achievable Sufficient Statistic (MASS) Learning, a machine learning training objective for which the minima are minimal sufficient statistics with respect to a class of functions being optimized over (e.g., deep networks). In deriving MASS Learning, we also introduce Conserved Differential Information (CDI), an information-theoretic quantity that {—} unlike standard mutual information {—} can be usefully applied to deterministically-dependent continuous random variables like the input and output of a deep network. In a series of experiments, we show that deep networks trained with MASS Learning achieve competitive performance on supervised learning, regularization, and uncertainty quantification benchmarks

    Machine Learning Methods Economists Should Know About

    Full text link
    We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models

    A review of homomorphic encryption and software tools for encrypted statistical machine learning

    Full text link
    Recent advances in cryptography promise to enable secure statistical computation on encrypted data, whereby a limited set of operations can be carried out without the need to first decrypt. We review these homomorphic encryption schemes in a manner accessible to statisticians and machine learners, focusing on pertinent limitations inherent in the current state of the art. These limitations restrict the kind of statistics and machine learning algorithms which can be implemented and we review those which have been successfully applied in the literature. Finally, we document a high performance R package implementing a recent homomorphic scheme in a general framework.Comment: 21 pages, technical repor

    MONEYBaRL: Exploiting pitcher decision-making using Reinforcement Learning

    Full text link
    This manuscript uses machine learning techniques to exploit baseball pitchers' decision making, so-called "Baseball IQ," by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher's current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter's ensuing action and the result of the pitch.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS712 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Bayesian Model of node interaction in networks

    Full text link
    We are concerned with modeling the strength of links in networks by taking into account how often those links are used. Link usage is a strong indicator of how closely two nodes are related, but existing network models in Bayesian Statistics and Machine Learning are able to predict only wether a link exists at all. As priors for latent attributes of network nodes we explore the Chinese Restaurant Process (CRP) and a multivariate Gaussian with fixed dimensionality. The model is applied to a social network dataset and a word coocurrence dataset

    Semi-Stochastic Frank-Wolfe Algorithms with Away-Steps for Block-Coordinate Structure Problems

    Full text link
    We propose a semi-stochastic Frank-Wolfe algorithm with away-steps for regularized empirical risk minimization and extend it to problems with block-coordinate structure. Our algorithms use adaptive step-size and we show that they converge linearly in expectation. The proposed algorithms can be applied to many important problems in statistics and machine learning including regularized generalized linear models, support vector machines and many others. In preliminary numerical tests on structural SVM and graph-guided fused LASSO, our algorithms outperform other competing algorithms in both iteration cost and total number of data passes

    Optimal Margin Distribution Machine

    Full text link
    Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. Based on this idea, we propose a new method, named Optimal margin Distribution Machine (ODM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance. The proposed method is a general learning approach which can be used in any place where SVM can be applied, and their superiority is verified both theoretically and empirically in this paper.Comment: arXiv admin note: substantial text overlap with arXiv:1311.098

    How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning Literature

    Full text link
    Machine learning workflow development is anecdotally regarded to be an iterative process of trial-and-error with humans-in-the-loop. However, we are not aware of quantitative evidence corroborating this popular belief. A quantitative characterization of iteration can serve as a benchmark for machine learning workflow development in practice, and can aid the development of human-in-the-loop machine learning systems. To this end, we conduct a small-scale survey of the applied machine learning literature from five distinct application domains. We collect and distill statistics on the role of iteration within machine learning workflow development, and report preliminary trends and insights from our investigation, as a starting point towards this benchmark. Based on our findings, we finally describe desiderata for effective and versatile human-in-the-loop machine learning systems that can cater to users in diverse domains
    • …
    corecore