35,034 research outputs found

    Bayesian Approximate Kernel Regression with Variable Selection

    Full text link
    Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an effect size analog of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant --- for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e. phenotypic prediction) and association mapping (i.e. inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations presented; references adde

    Discrete versus continuous domain models for disease mapping

    Get PDF
    The main goal of disease mapping is to estimate disease risk and identify high-risk areas. Such analyses are hampered by the limited geographical resolution of the available data. Typically the available data are counts per spatial unit and the common approach is the Besag--York--Molli{\'e} (BYM) model. When precise geocodes are available, it is more natural to use Log-Gaussian Cox processes (LGCPs). In a simulation study mimicking childhood leukaemia incidence using actual residential locations of all children in the canton of Z\"urich, Switzerland, we compare the ability of these models to recover risk surfaces and identify high-risk areas. We then apply both approaches to actual data on childhood leukaemia incidence in the canton of Z\"urich during 1985-2015. We found that LGCPs outperform BYM models in almost all scenarios considered. Our findings suggest that there are important gains to be made from the use of LGCPs in spatial epidemiology.Comment: 28 pages, 4 figures, 2 Table

    Active inference, evidence accumulation, and the urn task

    Get PDF
    Deciding how much evidence to accumulate before making a decision is a problem we and other animals often face, but one that is not completely understood. This issue is particularly important because a tendency to sample less information (often known as reflection impulsivity) is a feature in several psychopathologies, such as psychosis. A formal understanding of information sampling may therefore clarify the computational anatomy of psychopathology. In this theoretical letter, we consider evidence accumulation in terms of active (Bayesian) inference using a generic model of Markov decision processes. Here, agents are equipped with beliefs about their own behavior--in this case, that they will make informed decisions. Normative decision making is then modeled using variational Bayes to minimize surprise about choice outcomes. Under this scheme, different facets of belief updating map naturally onto the functional anatomy of the brain (at least at a heuristic level). Of particular interest is the key role played by the expected precision of beliefs about control, which we have previously suggested may be encoded by dopaminergic neurons in the midbrain. We show that manipulating expected precision strongly affects how much information an agent characteristically samples, and thus provides a possible link between impulsivity and dopaminergic dysfunction. Our study therefore represents a step toward understanding evidence accumulation in terms of neurobiologically plausible Bayesian inference and may cast light on why this process is disordered in psychopathology

    Approximating Cross-validatory Predictive P-values with Integrated IS for Disease Mapping Models

    Full text link
    An important statistical task in disease mapping problems is to identify out- lier/divergent regions with unusually high or low residual risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is a gold standard for computing predictive p-value that can flag such outliers. However, actual LOOCV is time-consuming because one needs to re-simulate a Markov chain for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called iIS, for approximating LOOCV with only Markov chain samples simulated from a posterior based on a full data set. iIS is based on importance sampling (IS). iIS integrates the p-value and the likelihood of the test observation with respect to the distribution of the latent variable without reference to the actual observation. The predictive p-values computed with iIS can be proved to be equivalent to the LOOCV predictive p-values, following the general theory for IS. We com- pare iIS and other three existing methods in the literature with a lip cancer dataset collected in Scotland. Our empirical results show that iIS provides predictive p-values that are al- most identical to the actual LOOCV predictive p-values and outperforms the existing three methods, including the recently proposed ghosting method by Marshall and Spiegelhalter (2007).Comment: 21 page
    • …
    corecore