Search CORE

96 research outputs found

Bayesian nonparametric analysis of reversible Markov chains

Author: Bacallado Sergio
Favaro Stefano
Trippa Lorenzo
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

We introduce a three-parameter random walk with reinforcement, called the

(\theta,\alpha,\beta)

scheme, which generalizes the linearly edge reinforced random walk to uncountable spaces. The parameter

\beta

smoothly tunes the

(\theta,\alpha,\beta)

scheme between this edge reinforced random walk and the classical exchangeable two-parameter Hoppe urn scheme, while the parameters

\alpha

and

\theta

modulate how many states are typically visited. Resorting to de Finetti's theorem for Markov chains, we use the

(\theta,\alpha,\beta)

scheme to define a nonparametric prior for Bayesian analysis of reversible Markov chains. The prior is applied in Bayesian nonparametric inference for species sampling problems with data generated from a reversible Markov chain with an unknown transition kernel. As a real example, we analyze data from molecular dynamics simulations of protein folding.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1102 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Institutional Research Information System University of Turin

Recommended from our members

A comparison of bayesian adaptive randomization and multi-stage designs for multi-arm clinical trials

Author: Trippa Lorenzo
Wason James
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/05/2014
Field of study

Harvard University - DASH

Interpretable Model Summaries Using the Wasserstein Distance

Author: Dunipace Eric
Trippa Lorenzo
Publication venue
Publication date: 02/04/2021
Field of study

Statistical models often include thousands of parameters. However, large models decrease the investigator's ability to interpret and communicate the estimated parameters. Reducing the dimensionality of the parameter space in the estimation phase is a commonly used approach, but less work has focused on selecting subsets of the parameters for interpreting the estimated model -- especially in settings such as Bayesian inference and model averaging. Importantly, many models do not have straightforward interpretations and create another layer of obfuscation. To solve this gap, we introduce a new method that uses the Wasserstein distance to identify a low-dimensional interpretable model projection. After the estimation of complex models, users can budget how many parameters they wish to interpret and the proposed generates a simplified model of the desired dimension minimizing the distance to the full model. We provide simulation results to illustrate the method and apply it to cancer datasets

arXiv.org e-Print Archive

A Class of Normalized Random Measures with an Exact Predictive Sampling Scheme

Author: Favaro Stefano
Lorenzo Trippa
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Institutional Research Information System University of Turin

False discovery rates in somatic mutation studies of cancer

Author: Parmigiani Giovanni
Trippa Lorenzo
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/07/2011
Field of study

The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of somatic mutation frequency data generated in these studies. We place special emphasis on a two-stage study design introduced by Sj\"{o}blom et al. [Science 314 (2006) 268--274]. In this context, we describe and compare statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical significance of the candidates thus identified. Controversy has surrounded the reliability of the false discovery rates estimates provided by the approximations used in early cancer genome studies. To address these, we develop a semiparametric Bayesian model that provides an accurate fit to the data. We use this model to generate a large collection of realistic scenarios, and evaluate alternative approaches on this collection. Our assessment is impartial in that the model used for generating data is not used by any of the approaches compared. And is objective, in that the scenarios are generated by a model that fits data. Our results quantify the conservative control of the false discovery rate with the Benjamini and Hockberg method compared to the empirical Bayes approach and the multiple testing method proposed in Storey [J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479--498]. Simulation results also show a negligible departure from the target false discovery rate for the methodology used in Sj\"{o}blom et al. [Science 314 (2006) 268--274].Comment: Published in at http://dx.doi.org/10.1214/10-AOAS438 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref