777 research outputs found

    Concentration of measure, negative association, and machine learning

    Full text link
    In this thesis we consider concentration inequalities and the concentration of measure phenomenon from a variety of angles. Sharp tail bounds on the deviation of Lipschitz functions of independent random variables about their mean are well known. We consider variations on this theme for dependent variables on the Boolean cube. In recent years negatively associated probability distributions have been studied as potential generalizations of independent random variables. Results on this class of distributions have been sparse at best, even when restricting to the Boolean cube. We consider the class of negatively associated distributions topologically, as a subset of the general class of probability measures. Both the weak (distributional) topology and the total variation topology are considered, and the simpler notion of negative correlation is investigated. The concentration of measure phenomenon began with Milman's proof of Dvoretzky's theorem, and is therefore intimately connected to the field of high-dimensional convex geometry. Recently this field has found application in the area of compressed sensing. We consider these applications and in particular analyze the use of Gordon's min-max inequality in various compressed sensing frameworks, including the Dantzig selector and the matrix uncertainty selector. Finally we consider the use of concentration inequalities in developing a theoretically sound anomaly detection algorithm. Our method uses a ranking procedure based on KNN graphs of given data. We develop a max-margin learning-to-rank framework to train limited complexity models to imitate these KNN scores. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density

    Classic algorithms are fair learners: Classification Analysis of natural weather and wildfire occurrences

    Full text link
    Classic machine learning algorithms have been reviewed and studied mathematically on its performance and properties in detail. This paper intends to review the empirical functioning of widely used classical supervised learning algorithms such as Decision Trees, Boosting, Support Vector Machines, k-nearest Neighbors and a shallow Artificial Neural Network. The paper evaluates these algorithms on a sparse tabular data for classification task and observes the effect on specific hyperparameters on these algorithms when the data is synthetically modified for higher noise. These perturbations were introduced to observe these algorithms on their efficiency in generalizing for sparse data and their utility of different parameters to improve classification accuracy. The paper intends to show that these classic algorithms are fair learners even for such limited data due to their inherent properties even for noisy and sparse datasets

    Ensembles of wrappers for automated feature selection in fish age classification

    Get PDF
    In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft

    Nonparametric Bayesian Deep Learning for Scientific Data Analysis

    Get PDF
    Deep learning (DL) has emerged as the leading paradigm for predictive modeling in a variety of domains, especially those involving large volumes of high-dimensional spatio-temporal data such as images and text. With the rise of big data in scientific and engineering problems, there is now considerable interest in the research and development of DL for scientific applications. The scientific domain, however, poses unique challenges for DL, including special emphasis on interpretability and robustness. In particular, a priority of the Department of Energy (DOE) is the research and development of probabilistic ML methods that are robust to overfitting and offer reliable uncertainty quantification (UQ) on high-dimensional noisy data that is limited in size relative to its complexity. Gaussian processes (GPs) are nonparametric Bayesian models that are naturally robust to overfitting and offer UQ out-of-the-box. Unfortunately, traditional GP methods lack the balance of expressivity and domain-specific inductive bias that is key to the success of DL. Recently, however, a number of approaches have emerged to incorporate the DL paradigm into GP methods, including deep kernel learning (DKL), deep Gaussian processes (DGPs), and neural network Gaussian processes (NNGPs). In this work, we investigate DKL, DGPs, and NNGPs as paradigms for developing robust models for scientific applications. First, we develop DKL for text classification, and apply both DKL and Bayesian neural networks (BNNs) to the problem of classifying cancer pathology reports, with BNNs attaining new state-of-the-art results. Next, we introduce the deep ensemble kernel learning (DEKL) method, which is just as powerful as DKL while admitting easier model parallelism. Finally, we derive a new model called a ``bottleneck NNGP\u27\u27 by unifying the DGP and NNGP paradigms, thus laying the groundwork for a new class of methods for future applications

    An Information Approach to Regularization Parameter Selection for the Solution of Ill-Posed Inverse Problems Under Model Misspecification

    Get PDF
    Engineering problems are often ill-posed, i.e. cannot be solved by conventional data-driven methods such as parametric linear and nonlinear regression or neural networks. A method of regularization that is used for the solution of ill-posed problems requires an a priori choice of the regularization parameter. Several regularization parameter selection methods have been proposed in the literature, yet, none is resistant to model misspecification. Since almost all models are incorrectly or approximately specified, misspecification resistance is a valuable option for engineering applications. Each data-driven method is based on a statistical procedure which can perform well on one data set and can fail on other. Therefore, another useful feature of a data- driven method is robustness. This dissertation proposes a methodology of developing misspecification-resistant and robust regularization parameter selection methods through the use of the information complexity approach. The original contribution of the dissertation to the field of ill-posed inverse problems in engineering is a new robust regularization parameter selection method. This method is misspecification-resistant, i.e. it works consistently when the model is misspecified. The method also improves upon the information-based regularization parameter selection methods by correcting inadequate penalization of estimation inaccuracy through the use of the information complexity framework. Such an improvement makes the proposed regularization parameter selection method robust and reduces the risk of obtaining grossly underregularized solutions. A method of misspecification detection is proposed based on the discrepancy between the proposed regularization parameter selection method and its correctly specified version. A detected misspecification indicates that the model may be inadequate for the particular problem and should be revised. The superior performance of the proposed regularization parameter selection method is demonstrated by practical examples. Data for the examples are from Carolina Power & Light\u27s Crystal River Nuclear Power Plant and a TVA fossil power plant. The results of applying the proposed regularization parameter selection method to the data demonstrate that the method is robust, i.e. does not produce grossly underregularized solutions, and performs well when the model is misspecified. This enables one to implement the proposed regularization parameter selection method in autonomous diagnostic and monitoring systems

    π\piVAE: Encoding stochastic process priors with variational autoencoders

    Full text link
    Stochastic processes provide a mathematically elegant way model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. In practice, however, efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder (π\piVAE). The π\piVAE is finitely exchangeable and Kolmogorov consistent, and thus is a continuous stochastic process. We use π\piVAE to learn low dimensional embeddings of function classes. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference (such as the integral of a log Gaussian process). For popular tasks, such as spatial interpolation, π\piVAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate that the low dimensional independently distributed latent space representation learnt provides an elegant and scalable means of performing Bayesian inference for stochastic processes within probabilistic programming languages such as Stan

    Multi-user receiver structures for direct sequence code division multiple access

    Get PDF
    • …
    corecore