10 research outputs found

    Forward Stagewise Naive Bayes

    Get PDF
    The naïve Bayes approach is a simple but often satisfactory method for supervised classification. In this paper, we focus on the naïve Bayes model and propose the application of regularization techniques to learn a naïve Bayes classifier. The main contribution of the paper is a stagewise version of the selective naïve Bayes, which can be considered a regularized version of the naïve Bayes model. We call it forward stagewise naïve Bayes. For comparison’s sake, we also introduce an explicitly regularized formulation of the naïve Bayes model, where conditional independence (absence of arcs) is promoted via an L 1/L 2-group penalty on the parameters that define the conditional probability distributions. Although already published in the literature, this idea has only been applied for continuous predictors. We extend this formulation to discrete predictors and propose a modification that yields an adaptive penalization. We show that, whereas the L 1/L 2 group penalty formulation only discards irrelevant predictors, the forward stagewise naïve Bayes can discard both irrelevant and redundant predictors, which are known to be harmful for the naïve Bayes classifier. Both approaches, however, usually improve the classical naïve Bayes model’s accuracy

    Learning an L1-regularized Gaussian Bayesian Network in the Equivalence Class Space

    Get PDF
    Learning the structure of a graphical model from data is a common task in a wide range of practical applications. In this paper, we focus on Gaussian Bayesian networks, i.e., on continuous data and directed acyclic graphs with a joint probability density of all variables given by a Gaussian. We propose to work in an equivalence class search space, specifically using the k-greedy equivalence search algorithm. This, combined with regularization techniques to guide the structure search, can learn sparse networks close to the one that generated the data. We provide results on some synthetic networks and on modeling the gene network of the two biological pathways regulating the biosynthesis of isoprenoids for the Arabidopsis thaliana plan

    Regularization for sparsity in statistical analysis and machine learning

    Full text link
    Pragmatism is the leading motivation of regularization. We can understand regularization as a modification of the maximum-likelihood estimator so that a reasonable answer could be given in an unstable or ill-posed situation. To mention some typical examples, this happens when fitting parametric or non-parametric models with more parameters than data or when estimating large covariance matrices. Regularization is usually used, in addition, to improve the bias-variance tradeoff of an estimation. Then, the definition of regularization is quite general, and, although the introduction of a penalty is probably the most popular type, it is just one out of multiple forms of regularization. In this dissertation, we focus on the applications of regularization for obtaining sparse or parsimonious representations, where only a subset of the inputs is used. A particular form of regularization, L1-regularization, plays a key role for reaching sparsity. Most of the contributions presented here revolve around L1-regularization, although other forms of regularization are explored (also pursuing sparsity in some sense). In addition to present a compact review of L1-regularization and its applications in statistical and machine learning, we devise methodology for regression, supervised classification and structure induction of graphical models. Within the regression paradigm, we focus on kernel smoothing learning, proposing techniques for kernel design that are suitable for high dimensional settings and sparse regression functions. We also present an application of regularized regression techniques for modeling the response of biological neurons. Supervised classification advances deal, on the one hand, with the application of regularization for obtaining a na¨ıve Bayes classifier and, on the other hand, with a novel algorithm for brain-computer interface design that uses group regularization in an efficient manner. Finally, we present a heuristic for inducing structures of Gaussian Bayesian networks using L1-regularization as a filter. El pragmatismo es la principal motivación de la regularización. Podemos entender la regularización como una modificación del estimador de máxima verosimilitud, de tal manera que se pueda dar una respuesta cuando la configuración del problema es inestable. A modo de ejemplo, podemos mencionar el ajuste de modelos paramétricos o no paramétricos cuando hay más parámetros que casos en el conjunto de datos, o la estimación de grandes matrices de covarianzas. Se suele recurrir a la regularización, además, para mejorar el compromiso sesgo-varianza en una estimación. Por tanto, la definición de regularización es muy general y, aunque la introducción de una función de penalización es probablemente el método más popular, éste es sólo uno de entre varias posibilidades. En esta tesis se ha trabajado en aplicaciones de regularización para obtener representaciones dispersas, donde sólo se usa un subconjunto de las entradas. En particular, la regularización L1 juega un papel clave en la búsqueda de dicha dispersión. La mayor parte de las contribuciones presentadas en la tesis giran alrededor de la regularización L1, aunque también se exploran otras formas de regularización (que igualmente persiguen un modelo disperso). Además de presentar una revisión de la regularización L1 y sus aplicaciones en estadística y aprendizaje de máquina, se ha desarrollado metodología para regresión, clasificación supervisada y aprendizaje de estructura en modelos gráficos. Dentro de la regresión, se ha trabajado principalmente en métodos de regresión local, proponiendo técnicas de diseño del kernel que sean adecuadas a configuraciones de alta dimensionalidad y funciones de regresión dispersas. También se presenta una aplicación de las técnicas de regresión regularizada para modelar la respuesta de neuronas reales. Los avances en clasificación supervisada tratan, por una parte, con el uso de regularización para obtener un clasificador naive Bayes y, por otra parte, con el desarrollo de un algoritmo que usa regularización por grupos de una manera eficiente y que se ha aplicado al diseño de interfaces cerebromáquina. Finalmente, se presenta una heurística para inducir la estructura de redes Bayesianas Gaussianas usando regularización L1 a modo de filtro

    An L1-Regularized naïve bayes-inspired classifier for discarding redundant and irrelevant predictors

    Full text link
    The naïve Bayes model is a simple but often satisfactory supervised classification method. The original naïve Bayes scheme, does, however, have a serious weakness, namely, the harmful effect of redundant predictors. In this paper, we study how to apply a regularization technique to learn a computationally efficient classifier that is inspired by naïve Bayes. The proposed formulation, combined with an L1-penalty, is capable of discarding harmful, redundant predictors. A modification of the LARS algorithm is devised to solve this problem. We tackle both real-valued and discrete predictors, assuring that our method is applicable to a wide range of data. In the experimental section, we empirically study the effect of redundant and irrelevant predictors. We also test the method on a high dimensional data set from the neuroscience field, where there are many more predictors than data cases. Finally, we run the method on a real data set than combines categorical with numeric predictors. Our approach is compared with several naïve Bayes variants and other classification algorithms (SVM and kNN), and is shown to be competitive

    Variable selection in local regression models via an iterative LASSO

    Full text link
    Locally weighted regression is a technique that predicts the response for new cases from their neighbors in the training dataset. In this paper we propose to join modern regularization approaches to locally weighted regression. Specifically, the LASSO method is able to select relevant variables leading to sparse models. We present two algorithms that embed LASSO in an iterative procedure that incrementally discard or add variables, respectively, in such a way that a LASSO-wise regularization path is locally obtained. The algorithms are tested in two different datasets from the UCI repository, obtaining promising results

    A Survey of L1 Regression

    Full text link
    L1 regularization, or regularization with an L1 penalty, is a popular idea in statistics and machine learning. This paper reviews the concept and application of L1 regularization for regression. It is not our aim to present a comprehensive list of the utilities of the L1 penalty in the regression setting. Rather, we focus on what we believe is the set of most representative uses of this regularization technique, which we describe in some detail. Thus, we deal with a number of L1-regularized methods for linear regression, generalized linear models, and time series analysis. Although this review targets practice rather than theory, we do give some theoretical details about L1-penalized linear regression, usually referred to as the least absolute shrinkage and selection operator (lasso)

    Classification of neural signals from sparse autoregressive features

    Full text link
    This paper introduces a signal classification framework that can be used for brain–computer interface design. The actual classification is performed on sparse autoregressive features. It can use any well-known classification algorithm, such as discriminant analysis, linear logistic regression and support vector machines. The autoregressive coefficients of all signals and channels are simultaneously estimated by the group lasso, and the estimation is guided by the classification performance. Thanks to the variable selection capability of the group lasso, the framework can drop individual autoregressive coefficients that are useless in the prediction stage. Also, the framework is relatively insensitive to the chosen autoregressive order. We devise an efficient algorithm to solve this problem. We test our approach on Keirn and Aunon's data, used for binary classification of electroencephalogram signals, achieving promising results

    Learning a L1-regularized Gaussian Bayesian network in the space of equivalence classes

    Full text link
    Learning the structure of a graphical model from data is a common task in a wide range of practical applications. In this paper we focus on Gaussian Bayesian networks (GBN), that is, on continuous data and directed graphs. We propose to work in an equivalence class search space that, combined with regularization techniques to guide the search of the structure, allows to learn a sparse network close to the one that generated the data

    Sparse regularized local regression

    Full text link
    The intention is to provide a Bayesian formulation of regularized local linear regression, combined with techniques for optimal bandwidth selection. This approach arises from the idea that only those covariates that are found to be relevant for the regression function should be considered by the kernel function used to define the neighborhood of the point of interest. However, the regression function itself depends on the kernel function. A maximum posterior joint estimation of the regression parameters is given. Also, an alternative algorithm based on sampling techniques is developed for finding both the regression parameter distribution and the predictive distribution

    Bayesian sparse partial least squares

    Full text link
    Partial least squares (PLS) is a class of methods that makes use of a set of latent or unobserved variables to model the relation between (typically) two sets of input and output variables, respectively. Several flavors, depending on how the latent variables or components are computed, have been developed over the last years. In this letter, we propose a Bayesian formulation of PLS along with some extensions. In a nutshell, we provide sparsity at the input space level and an automatic estimation of the optimal number of latent components. We follow the variational approach to infer the parameter distributions. We have successfully tested the proposed methods on a synthetic data benchmark and on electrocorticogram data associated with several motor outputs in monkeys
    corecore