Search CORE

483 research outputs found

A Bayesian Nonparametric Method for Prediction in EST Analysis

Author: Antonio Lijoi
Igor Prünster
Ramsés H. Mena
Publication venue
Publication date
Field of study

In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries studied in Susko and Roger (2004), with frequentist methods, are analyzed in detail.

Research Papers in Economics

Affine equivariant rank-weighted L-estimation of multivariate location

Author: B. Chakraborty
E. Roelant
H. Chernoff
H.P. Lopuhaä
J. Jurečková
L. Kong
M. Eaton
M. Hallin
N.C. Weber
P.K. Sen
R.J. Serfling
R.L. Obenchain
R.Y. Liu
W. Hoeffding
Y. Zuo
Y. Zuo
Y. Zuo
Y.P. Chaubey
Publication venue
Publication date: 01/01/2015
Field of study

In the multivariate one-sample location model, we propose a class of flexible robust, affine-equivariant L-estimators of location, for distributions invoking affine-invariance of Mahalanobis distances of individual observations. An involved iteration process for their computation is numerically illustrated.Comment: 16 pages, 4 figures, 6 table

arXiv.org e-Print Archive

Crossref

Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics

Author: Arbel Julyan
Favaro Stefano
Nipoti Bernardo
Teh Yee Whye
Publication venue: 'Institute of Statistical Science'
Publication date: 02/07/2016
Field of study

Given a sample of size

n

from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability

D_{n}(l)

that the

(n+1)

-th draw coincides with a species with frequency

l

in the sample, for any

l=0,1,\ldots,n

. This paper contributes to the methodology of Bayesian nonparametric inference for

D_{n}(l)

. Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of

D_{n}(l)

, and we investigate the large

n

asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for

D_{n}(l)

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Oxford University Research Archive

Institutional Research Information System University of Turin

Bayesian nonparametric estimators derived from conditional Gibbs structures

Author: Antonio Lijoi
Igor Pruenster
Stephen G. Walker
Publication venue
Publication date
Field of study

We consider discrete nonparametric priors which induce Gibbs-type exchangeable random partitions and investigate their posterior behavior in detail. In particular, we deduce conditional distributions and the corresponding Bayesian nonparametric estimators, which can be readily exploited for predicting various features of additional samples. The results provide useful tools for genomic applications where prediction of future outcomes is required.Bayesian nonparametric inference; Exchangeable random partitions; Generalized factorial coeffcients; Generalized gamma process; Poisson-Dirichlet process; Population genetics.

Research Papers in Economics

Rediscovery of Good-Turing estimators via Bayesian nonparametrics

Author: Favaro Stefano
Nipoti Bernardo
Teh Yee Whye
Publication venue
Publication date: 16/06/2015
Field of study

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this paper we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library

arXiv.org e-Print Archive

CiteSeerX

Institutional Research Information System University of Turin

A new estimator of the discovery probability

Author: Favaro Stefano
Lijoi A.
Pruenster Igor
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Institutional Research Information System University of Turin

A probabilistic study of neural complexity

Author: Buzzi Jerome
Zambotti Lorenzo
Publication venue
Publication date: 01/01/2009
Field of study

G. Edelman, O. Sporns, and G. Tononi have introduced the neural complexity of a family of random variables, defining it as a specific average of mutual information over subfamilies. We show that their choice of weights satisfies two natural properties, namely exchangeability and additivity, and we call any functional satisfying these two properties an intricacy. We classify all intricacies in terms of probability laws on the unit interval and study the growth rate of maximal intricacies when the size of the system goes to infinity. For systems of a fixed size, we show that maximizers have small support and exchangeable systems have small intricacy. In particular, maximizing intricacy leads to spontaneous symmetry breaking and failure of uniqueness.Comment: minor edit

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

Sparse adaptive Dirichlet-multinomial-like processes

Author: Hutter Marcus
Publication venue: Journal of Machine Learning Research
Publication date: 29/11/2018
Field of study

Online estimation and modelling of i.i.d. data for short sequences over large or complex ''alphabets'' is a ubiquitous (sub)problem in machine learning, information theory, data compression, statistical language processing, and document analysis. The Dirichlet-Multinomial distribution (also called Polya urn scheme) and extensions thereof are widely applied for online i.i.d. estimation. Good a-priori choices for the parameters in this regime are difficult to obtain though. I derive an optimal adaptive choice for the main parameter via tight, data-dependent redundancy bounds for a related model. The 1-line recommendation is to set the 'total mass' = 'precision' = 'concentration' parameter to m/2ln[(n+1)/m], where n is the (past) sample size and m the number of different symbols observed (so far). The resulting estimator is simple, online, fast, and experimental performance is superb

The Australian National University

A Bernstein-Von Mises Theorem for discrete probability distributions

Author: Boucheron S.
Gassiat E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We investigate the asymptotic normality of the posterior distribution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function

\theta_0

on \mathbbm{N}\setminus \{0\} and a sequence of truncation levels

(k_n)_n

satisfying

k_n^3\leq n\inf_{i\leq k_n}\theta_0(i).

Let

\hat{\theta}

denote the maximum likelihood estimate of

(\theta_0(i))_{i\leq k_n}

and let

\Delta_n(\theta_0)

denote the

k_n

-dimensional vector which

i

-th coordinate is defined by \sqrt{n} (\hat{\theta}_n(i)-\theta_0(i)) for

1\leq i\leq k_n.

We check that under mild conditions on

\theta_0

and on the sequence of prior probabilities on the

k_n

-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recentered around

\hat{\theta}_n

and rescaled by

\sqrt{n}

and the

k_n

-dimensional Gaussian distribution

\mathcal{N}(\Delta_n(\theta_0),I^{-1}(\theta_0))

converges in probability to

0.

This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and R\'{e}nyi entropies. The proofs are based on concentration inequalities for centered and non-centered Chi-square (Pearson) statistics. The latter allow to establish posterior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.Comment: Published in at http://dx.doi.org/10.1214/08-EJS262 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot