22,090 research outputs found

    A nonparametric empirical Bayes approach to covariance matrix estimation

    Full text link
    We propose an empirical Bayes method to estimate high-dimensional covariance matrices. Our procedure centers on vectorizing the covariance matrix and treating matrix estimation as a vector estimation problem. Drawing from the compound decision theory literature, we introduce a new class of decision rules that generalizes several existing procedures. We then use a nonparametric empirical Bayes g-modeling approach to estimate the oracle optimal rule in that class. This allows us to let the data itself determine how best to shrink the estimator, rather than shrinking in a pre-determined direction such as toward a diagonal matrix. Simulation results and a gene expression network analysis shows that our approach can outperform a number of state-of-the-art proposals in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure

    Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach

    Full text link
    Poverty maps are used to aid important political decisions such as allocation of development funds by governments and international organizations. Those decisions should be based on the most accurate poverty figures. However, often reliable poverty figures are not available at fine geographical levels or for particular risk population subgroups due to the sample size limitation of current national surveys. These surveys cannot cover adequately all the desired areas or population subgroups and, therefore, models relating the different areas are needed to 'borrow strength" from area to area. In particular, the Spanish Survey on Income and Living Conditions (SILC) produces national poverty estimates but cannot provide poverty estimates by Spanish provinces due to the poor precision of direct estimates, which use only the province specific data. It also raises the ethical question of whether poverty is more severe for women than for men in a given province. We develop a hierarchical Bayes (HB) approach for poverty mapping in Spanish provinces by gender that overcomes the small province sample size problem of the SILC. The proposed approach has a wide scope of application because it can be used to estimate general nonlinear parameters. We use a Bayesian version of the nested error regression model in which Markov chain Monte Carlo procedures and the convergence monitoring therein are avoided. A simulation study reveals good frequentist properties of the HB approach. The resulting poverty maps indicate that poverty, both in frequency and intensity, is localized mostly in the southern and western provinces and it is more acute for women than for men in most of the provinces.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS702 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks

    Full text link
    We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl

    Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

    Full text link
    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure

    Structure Learning in Coupled Dynamical Systems and Dynamic Causal Modelling

    Get PDF
    Identifying a coupled dynamical system out of many plausible candidates, each of which could serve as the underlying generator of some observed measurements, is a profoundly ill posed problem that commonly arises when modelling real world phenomena. In this review, we detail a set of statistical procedures for inferring the structure of nonlinear coupled dynamical systems (structure learning), which has proved useful in neuroscience research. A key focus here is the comparison of competing models of (ie, hypotheses about) network architectures and implicit coupling functions in terms of their Bayesian model evidence. These methods are collectively referred to as dynamical casual modelling (DCM). We focus on a relatively new approach that is proving remarkably useful; namely, Bayesian model reduction (BMR), which enables rapid evaluation and comparison of models that differ in their network architecture. We illustrate the usefulness of these techniques through modelling neurovascular coupling (cellular pathways linking neuronal and vascular systems), whose function is an active focus of research in neurobiology and the imaging of coupled neuronal systems

    Kernel Bayes' rule

    Full text link
    A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces. Probabilities are uniquely characterized by the mean of the canonical map to the RKHS. The prior and conditional probabilities are expressed in terms of RKHS functions of an empirical sample: no explicit parametric model is needed for these quantities. The posterior is likewise an RKHS mean of a weighted sample. The estimator for the expectation of a function of the posterior is derived, and rates of consistency are shown. Some representative applications of the kernel Bayes' rule are presented, including Baysian computation without likelihood and filtering with a nonparametric state-space model.Comment: 27 pages, 5 figure
    • …
    corecore