37 research outputs found

    The Bayes Lepski's Method and Credible Bands through Volume of Tubular Neighborhoods

    Full text link
    For a general class of priors based on random series basis expansion, we develop the Bayes Lepski's method to estimate unknown regression function. In this approach, the series truncation point is determined based on a stopping rule that balances the posterior mean bias and the posterior standard deviation. Equipped with this mechanism, we present a method to construct adaptive Bayesian credible bands, where this statistical task is reformulated into a problem in geometry, and the band's radius is computed based on finding the volume of certain tubular neighborhood embedded on a unit sphere. We consider two special cases involving B-splines and wavelets, and discuss some interesting consequences such as the uncertainty principle and self-similarity. Lastly, we show how to program the Bayes Lepski stopping rule on a computer, and numerical simulations in conjunction with our theoretical investigations concur that this is a promising Bayesian uncertainty quantification procedure.Comment: 42 pages, 2 figures, 1 tabl

    Empirical processes indexed by estimated functions

    Full text link
    We consider the convergence of empirical processes indexed by functions that depend on an estimated parameter η\eta and give several alternative conditions under which the ``estimated parameter'' ηn\eta_n can be replaced by its natural limit η0\eta_0 uniformly in some other indexing set Θ\Theta. In particular we reconsider some examples treated by Ghoudi and Remillard [Asymptotic Methods in Probability and Statistics (1998) 171--197, Fields Inst. Commun. 44 (2004) 381--406]. We recast their examples in terms of empirical process theory, and provide an alternative general view which should be of wide applicability.Comment: Published at http://dx.doi.org/10.1214/074921707000000382 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Cross-Validated Adaptive Epsilon-Net Estimator

    Get PDF
    Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution). In this article, we propose a cross-validated epsilon-net estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces. We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures

    Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

    Get PDF
    DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Semi-supervised empirical Bayes group-regularized factor regression

    Full text link
    The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.e., semi-supervised learning. In addition, the high dimensional features in biomedical prediction problems are often well characterised. Examples are genes, for which annotation is available, and metabolites with pp-values from a previous study available. In this paper, the extra information on the features is included in the prior model for the features. The extra information is weighted and included in the estimation through empirical Bayes, with Variational approximations to speed up the computation. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predictions oral cancer metastatsis from RNAseq data.Comment: 19 pages, 5 figures, submitted to Biometrical Journa

    Individual differences in puberty onset in girls: Bayesian estimation of heritabilities and genetic correlations

    Get PDF
    We report heritabilities for individual differences in female pubertal development at the age of 12. Tanner data on breast and pubic hair development in girls and data on menarche were obtained from a total of 184 pairs of monozygotic and dizygotic twins. Genetic correlations were estimated to determine to what extent the same genes are involved in different aspects of physical development in puberty. A Bayesian estimation approach was taken, using Markovchain Monte Carlo simulation to estimate model parameters. All three phenotypes were to a significant extent heritable and showed high genetic correlations, suggesting that a common set of genes is involved in the timing of puberty in general. However, gonadarche (menarche and breast development) and adrenarche (pubic hair) are affected by different environmental factors, which does not support the three phenotypes to be regarded as indicators of a unitary physiological factor. © 2006 Springer Science+Business Media, Inc
    corecore