25,966 research outputs found

    Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes

    Get PDF
    By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study the theoretical properties of IBP and pIBP under a binary factor model. We establish the posterior contraction rates for both IBP and pIBP and substantiate the theoretical results through simulation studies. This is the first work addressing the frequentist property of the posterior behaviors of IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP prior to a real data example arising in the field of cancer genomics where the exchangeability among subjects is violated

    Network inference and community detection, based on covariance matrices, correlations and test statistics from arbitrary distributions

    Get PDF
    In this paper we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modelling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data-sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data-sets

    A decision-theoretic approach for segmental classification

    Full text link
    This paper is concerned with statistical methods for the segmental classification of linear sequence data where the task is to segment and classify the data according to an underlying hidden discrete state sequence. Such analysis is commonplace in the empirical sciences including genomics, finance and speech processing. In particular, we are interested in answering the following question: given data yy and a statistical model π(x,y)\pi(x,y) of the hidden states xx, what should we report as the prediction x^\hat{x} under the posterior distribution π(x∣y)\pi (x|y)? That is, how should you make a prediction of the underlying states? We demonstrate that traditional approaches such as reporting the most probable state sequence or most probable set of marginal predictions can give undesirable classification artefacts and offer limited control over the properties of the prediction. We propose a decision theoretic approach using a novel class of Markov loss functions and report x^\hat{x} via the principle of minimum expected loss (maximum expected utility). We demonstrate that the sequence of minimum expected loss under the Markov loss function can be enumerated exactly using dynamic programming methods and that it offers flexibility and performance improvements over existing techniques. The result is generic and applicable to any probabilistic model on a sequence, such as Hidden Markov models, change point or product partition models.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS657 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Confidence Statements for Ordering Quantiles

    Full text link
    This work proposes Quor, a simple yet effective nonparametric method to compare independent samples with respect to corresponding quantiles of their populations. The method is solely based on the order statistics of the samples, and independence is its only requirement. All computations are performed using exact distributions with no need for any asymptotic considerations, and yet can be run using a fast quadratic-time dynamic programming idea. Computational performance is essential in high-dimensional domains, such as gene expression data. We describe the approach and discuss on the most important assumptions, building a parallel with assumptions and properties of widely used techniques for the same problem. Experiments using real data from biomedical studies are performed to empirically compare Quor and other methods in a classification task over a selection of high-dimensional data sets
    • …
    corecore