62 research outputs found

    Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species.</p> <p>Results</p> <p>We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ<sub>1</sub>-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries.</p> <p>Conclusion</p> <p>We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.</p

    PENALIZED LIKELIHOOD AND BAYESIAN METHODS FOR SPARSE CONTINGENCY TABLES: AN ANALYSIS OF ALTERNATIVE SPLICING IN FULL-LENGTH cDNA LIBRARIES

    Get PDF
    We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a simulation study and we apply the proposed methods to full-length cDNA libraries, yielding valuable insight into the biological process of alternative splicing

    A Dynamic Stochastic Block Model for Multi-Layer Networks

    Full text link
    We propose a flexible stochastic block model for multi-layer networks, where layer-specific hidden Markov-chain processes drive the changes in the formation of communities. The changes in block membership of a node in a given layer may be influenced by its own past membership in other layers. This allows for clustering overlap, clustering decoupling, or more complex relationships between layers including settings of unidirectional, or bidirectional, block causality. We cope with the overparameterization issue of a saturated specification by assuming a Multi-Laplacian prior distribution within a Bayesian framework. Data augmentation and Gibbs sampling are used to make the inference problem more tractable. Through simulations, we show that the standard linear models are not able to detect the block causality under the great majority of scenarios. As an application to trade networks, we show that our model provides a unified framework including community detection and Gravity equation. The model is used to study the causality between trade agreements and trade looking at the global topological properties of the networks as opposed to the main existent approaches which focus on local bilateral relationships. We are able to provide new evidence of unidirectional causality from the free trade agreements network to the non-observable trade barriers network structure for 159 countries in the period 1995-2017
    corecore