777 research outputs found
Concentration of measure, negative association, and machine learning
In this thesis we consider concentration inequalities and the concentration
of measure phenomenon from a variety of angles. Sharp tail bounds on the deviation of Lipschitz functions of independent random variables about their mean are well known. We consider variations on this theme for dependent variables on the Boolean cube.
In recent years negatively associated probability distributions have been studied as potential generalizations of independent random variables. Results on this class of distributions have been sparse at best, even when restricting to the Boolean cube. We consider the class of negatively associated distributions topologically, as a subset of the general class of probability measures. Both the weak (distributional) topology and the total variation topology
are considered, and the simpler notion of negative correlation is investigated.
The concentration of measure phenomenon began with Milman's proof of Dvoretzky's theorem, and is therefore intimately connected to the field of high-dimensional convex geometry. Recently this field has found application in the area of compressed sensing. We consider these applications and in particular analyze the use of Gordon's min-max inequality in various compressed sensing frameworks, including the Dantzig selector and the matrix uncertainty selector.
Finally we consider the use of concentration inequalities in developing a theoretically sound anomaly detection algorithm. Our method uses a ranking procedure based on KNN graphs
of given data. We develop a max-margin learning-to-rank framework to train limited complexity models to imitate these KNN scores. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown
underlying density
Classic algorithms are fair learners: Classification Analysis of natural weather and wildfire occurrences
Classic machine learning algorithms have been reviewed and studied
mathematically on its performance and properties in detail. This paper intends
to review the empirical functioning of widely used classical supervised
learning algorithms such as Decision Trees, Boosting, Support Vector Machines,
k-nearest Neighbors and a shallow Artificial Neural Network. The paper
evaluates these algorithms on a sparse tabular data for classification task and
observes the effect on specific hyperparameters on these algorithms when the
data is synthetically modified for higher noise. These perturbations were
introduced to observe these algorithms on their efficiency in generalizing for
sparse data and their utility of different parameters to improve classification
accuracy. The paper intends to show that these classic algorithms are fair
learners even for such limited data due to their inherent properties even for
noisy and sparse datasets
Ensembles of wrappers for automated feature selection in fish age classification
In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft
Nonparametric Bayesian Deep Learning for Scientific Data Analysis
Deep learning (DL) has emerged as the leading paradigm for predictive modeling in a variety of domains, especially those involving large volumes of high-dimensional spatio-temporal data such as images and text. With the rise of big data in scientific and engineering problems, there is now considerable interest in the research and development of DL for scientific applications. The scientific domain, however, poses unique challenges for DL, including special emphasis on interpretability and robustness. In particular, a priority of the Department of Energy (DOE) is the research and development of probabilistic ML methods that are robust to overfitting and offer reliable uncertainty quantification (UQ) on high-dimensional noisy data that is limited in size relative to its complexity. Gaussian processes (GPs) are nonparametric Bayesian models that are naturally robust to overfitting and offer UQ out-of-the-box. Unfortunately, traditional GP methods lack the balance of expressivity and domain-specific inductive bias that is key to the success of DL. Recently, however, a number of approaches have emerged to incorporate the DL paradigm into GP methods, including deep kernel learning (DKL), deep Gaussian processes (DGPs), and neural network Gaussian processes (NNGPs). In this work, we investigate DKL, DGPs, and NNGPs as paradigms for developing robust models for scientific applications. First, we develop DKL for text classification, and apply both DKL and Bayesian neural networks (BNNs) to the problem of classifying cancer pathology reports, with BNNs attaining new state-of-the-art results. Next, we introduce the deep ensemble kernel learning (DEKL) method, which is just as powerful as DKL while admitting easier model parallelism. Finally, we derive a new model called a ``bottleneck NNGP\u27\u27 by unifying the DGP and NNGP paradigms, thus laying the groundwork for a new class of methods for future applications
An Information Approach to Regularization Parameter Selection for the Solution of Ill-Posed Inverse Problems Under Model Misspecification
Engineering problems are often ill-posed, i.e. cannot be solved by conventional data-driven methods such as parametric linear and nonlinear regression or neural networks. A method of regularization that is used for the solution of ill-posed problems requires an a priori choice of the regularization parameter. Several regularization parameter selection methods have been proposed in the literature, yet, none is resistant to model misspecification. Since almost all models are incorrectly or approximately specified, misspecification resistance is a valuable option for engineering applications.
Each data-driven method is based on a statistical procedure which can perform well on one data set and can fail on other. Therefore, another useful feature of a data- driven method is robustness. This dissertation proposes a methodology of developing misspecification-resistant and robust regularization parameter selection methods through the use of the information complexity approach.
The original contribution of the dissertation to the field of ill-posed inverse problems in engineering is a new robust regularization parameter selection method. This method is misspecification-resistant, i.e. it works consistently when the model is misspecified. The method also improves upon the information-based regularization parameter selection methods by correcting inadequate penalization of estimation inaccuracy through the use of the information complexity framework. Such an improvement makes the proposed regularization parameter selection method robust and reduces the risk of obtaining grossly underregularized solutions.
A method of misspecification detection is proposed based on the discrepancy between the proposed regularization parameter selection method and its correctly specified version. A detected misspecification indicates that the model may be inadequate for the particular problem and should be revised.
The superior performance of the proposed regularization parameter selection method is demonstrated by practical examples. Data for the examples are from Carolina Power & Light\u27s Crystal River Nuclear Power Plant and a TVA fossil power plant. The results of applying the proposed regularization parameter selection method to the data demonstrate that the method is robust, i.e. does not produce grossly underregularized solutions, and performs well when the model is misspecified. This enables one to implement the proposed regularization parameter selection method in autonomous diagnostic and monitoring systems
VAE: Encoding stochastic process priors with variational autoencoders
Stochastic processes provide a mathematically elegant way model complex data.
In theory, they provide flexible priors over function classes that can encode a
wide range of interesting assumptions. In practice, however, efficient
inference by optimisation or marginalisation is difficult, a problem further
exacerbated with big data and high dimensional input spaces. We propose a novel
variational autoencoder (VAE) called the prior encoding variational autoencoder
(VAE). The VAE is finitely exchangeable and Kolmogorov consistent,
and thus is a continuous stochastic process. We use VAE to learn low
dimensional embeddings of function classes. We show that our framework can
accurately learn expressive function classes such as Gaussian processes, but
also properties of functions to enable statistical inference (such as the
integral of a log Gaussian process). For popular tasks, such as spatial
interpolation, VAE achieves state-of-the-art performance both in terms of
accuracy and computational efficiency. Perhaps most usefully, we demonstrate
that the low dimensional independently distributed latent space representation
learnt provides an elegant and scalable means of performing Bayesian inference
for stochastic processes within probabilistic programming languages such as
Stan
- …