6 research outputs found

    Hunting for Significance: Bayesian Classifiers Under a Mixture Loss Function

    Get PDF
    Detecting significance in a high-dimensional sparse data structure has received a large amount of attention in modern statistics. In the current paper, we introduce a compound decision rule to simultaneously classify signals from noise. This procedure is a Bayes rule subject to a mixture loss function. The loss function minimizes the number of false discoveries while controlling the false nondiscoveries by incorporating the signal strength information. Based on our criterion, strong signals will be penalized more heavily for nondiscovery than weak signals. In constructing this classification rule, we assume a mixture prior for the parameter which adapts to the unknown sparsity. This Bayes rule can be viewed as thresholding the “local fdr” (Efron, 2007) by adaptive thresholds. Both parametric and nonparametric methods will be discussed. The nonparametric procedure adapts to the unknown data structure well and outperforms the parametric one. Performance of the procedure is illustrated by various simulation studies and a real data application

    Bayesian Aspects of Classification Procedures

    Get PDF
    We consider several statistical approaches to binary classification and multiple hypothesis testing problems. Situations in which a binary choice must be made are common in science. Usually, there is uncertainty involved in making the choice and a great number of statistical techniques have been put forth to help researchers deal with this uncertainty in separating signal from noise in reasonable ways. For example, in genetic studies, one may want to identify genes that affect a certain biological process from among a larger set of genes. In such examples, costs are attached to making incorrect choices and many choices must be made at the same time. Reasonable ways of modeling the cost structure and choosing the appropriate criteria for evaluating the performance of statistical techniques are needed. The following three chapters have proposals of some Bayesian methods for these issues. In the first chapter, we focus on an empirical Bayes approach to a popular binary classification problem formulation. In this framework, observations are treated as independent draws from a hierarchical model with a mixture prior distribution. The mixture prior combines prior distributions for the ``noise\u27\u27 and for the ``signal\u27\u27 observations. In the literature, parametric assumptions are usually made about the prior distribution from which the ``signal\u27\u27 observations come. We suggest a Bayes classification rule which minimizes the expectation of a flexible and easily interpretable mixture loss function which brings together constant penalties for false positive misclassifications and L2L_2 penalties for false negative misclassifications. Due in part to the form of the loss function, empirical Bayes techniques can then be used to construct the Bayes classification rule without specifying the ``signal\u27\u27 part of the mixture prior distribution. The proposed classification technique builds directly on the nonparametric mixture prior approach proposed by Raykar and Zhao (2010, 2011). Many different criteria can be used to judge the success of a classification procedure. A very useful criterion called the False Discovery Rate (FDR) was introduced by Benjamini and Hochberg in a 1995 paper. For many applications, the FDR, which is defined as the expected proportion of false positive results among the observations declared to be ``signal\u27\u27, is a reasonable criterion to target. Bayesian versions of the false discovery rate, the so-called positive false discovery rate (pFDR) and local false discovery rate, were proposed by Storey (2002, 2003) and Efron and coauthors (2001), respectively. There is an interesting connection between the local false discovery rate and the nonparametric mixture prior approach for binary classification problems. The second part of the dissertation is focused on this link and provides a comparison of various approaches for estimating Bayesian false discovery rates. The third chapter is an account of a connection between the celebrated Neyman-Pearson lemma and the area (AUC) under the receiver operating characteristic (ROC) curve when the observations that need to be classified come from a pair of normal distributions. Using this connection, it is possible to derive a classification rule which maximizes the AUC for binormal data

    Statistical Essays Motivated by Genome-Wide Association Study

    Get PDF
    Genome-wide association studies (GWAS) have been gaining popularity in recent years, and have generated a lot of interests in statistics. In this dissertation, motivated by GWAS, we develop statistical methods to identify significant Single-Nucleotide Polymorphisms (SNPs) that are associated with certain phenotype traits of interest. Usually in GWAS, the number of SNPs are much larger than the number of individuals. Hence identifying significant SNPs and estimating their effects is a high-dimensional selection and estimation problem, or sometimes referred to as the large p and small n (p>>n) paradigm. In this talk, we propose three approaches to estimate the proportion of SNPs that are significantly associated with the trait of interest in GWAS, as well as the distribution of their effects. The first one extends the earlier work that models the SNP effects as random effects in a linear mixed model. We instead assume a mixture prior on the random effects, which consists of a pointmass at zero, for those non-significant SNPs, plus a normal component for those significant SNPs. We develop a fast Markov Chain Monte Carlo (MCMC) algorithm to estimate the model parameters. The proposed algorithm reduces the computation time significantly by calculating the posterior conditional on a set of latent variables, that index whether the SNPs are associated with the trait of interest or not. We further relax the prior distribution to a mixture point mass plus a non-parametric distribution. Two types of sieve estimators are proposed based on a least squares (LS) method for probability distributions under the framework of measurement error models. The estimators are obtained by minimizing the distance between the empirical distribution/characteristic functions and the model distribution/characteristic functions, respectively. In the last part, we propose an estimator for the normal mean problem that can adapt to the sparsity of the mean signals as well as incorporate correlation among the signals. The proposed estimator effectively decomposes the arbitrary covariance matrix of the observed signals into two parts: principal factors that derive the strong dependence and weakly dependent error terms. By taking out the largest common factors, the correlation among the signals are significantly weakened. An automatic nonparametric empirical Bayesian method is then used to estimate the sparsity and identify the nonzero means.Doctor of Philosoph

    Sparse Coding with Structured Sparsity Priors and Multilayer Architecture for Image Classification

    Get PDF
    Applying sparse coding on large dataset for image classification is a long standing problem in the field of computer vision. It has been found that the sparse coding models exhibit disappointing performance on these large datasets where variability is broad and anomalies are common. Conversely, deep neural networks thrive on bountiful data. Their success has encouraged researchers to try and augment the learning capacity of traditionally shallow sparse coding methods by adding layers. Multilayer sparse coding networks are expected to combine the best of both sparsity regularizations and deep architectures. To date, however, endeavors to marry the two techniques have not achieved significant improvements over their individual counterparts. In this thesis, we first briefly review multiple structured sparsity priors as well as various supervised dictionary learning techniques with applications on hyperspectral image classification. Based on the structured sparsity priors and dictionary learning techniques, we then develop a novel multilayer sparse coding network that contains thirteen sparse coding layers. The proposed sparse coding network learns both the dictionaries and the regularization parameters simultaneously using an end-to-end supervised learning scheme. We show empirical evidence that the regularization parameters can adapt to the given training data. We also propose applying dimension reduction within sparse coding networks to dramatically reduce the output dimensionality of the sparse coding layers and mitigate computational costs. Moreover, our sparse coding network is compatible with other powerful deep learning techniques such as drop out, batch normalization and shortcut connections. Experimental results show that the proposed multilayer sparse coding network produces classification accuracy competitive with the deep neural networks while using significantly fewer parameters and layers
    corecore