7,167 research outputs found

    Gamma-based clustering via ordered means with application to gene-expression analysis

    Full text link
    Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite catalog of structures, each one representing equality and inequality constraints among latent expected values. Computations depend on the probability that independent gamma-distributed variables attain each of their possible orderings. Each ordering event is equivalent to an event in independent negative-binomial random variables, and this finding guides a dynamic-programming calculation. The structuring of mixture-model components according to constraints among latent means leads to strict concavity of the mixture log likelihood. In addition to its beneficial numerical properties, the clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression

    Full text link
    One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, L^(D∣M)\hat{L}(D|M), and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label, L^(D0∣M)\hat{L}(D_0|M). Typically, the computational burden for obtaining L^(D0∣M)\hat{L}(D_0|M) is immense, often exceeding the limits of computing available resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.Comment: to be published in Journal of Computational Biology (2004

    Factorial graphical lasso for dynamic networks

    Full text link
    Dynamic networks models describe a growing number of important scientific processes, from cell biology and epidemiology to sociology and finance. There are many aspects of dynamical networks that require statistical considerations. In this paper we focus on determining network structure. Estimating dynamic networks is a difficult task since the number of components involved in the system is very large. As a result, the number of parameters to be estimated is bigger than the number of observations. However, a characteristic of many networks is that they are sparse. For example, the molecular structure of genes make interactions with other components a highly-structured and therefore sparse process. Penalized Gaussian graphical models have been used to estimate sparse networks. However, the literature has focussed on static networks, which lack specific temporal constraints. We propose a structured Gaussian dynamical graphical model, where structures can consist of specific time dynamics, known presence or absence of links and block equality constraints on the parameters. Thus, the number of parameters to be estimated is reduced and accuracy of the estimates, including the identification of the network, can be tuned up. Here, we show that the constrained optimization problem can be solved by taking advantage of an efficient solver, logdetPPA, developed in convex optimization. Moreover, model selection methods for checking the sensitivity of the inferred networks are described. Finally, synthetic and real data illustrate the proposed methodologies.Comment: 30 pp, 5 figure

    Mouse p53-deficient cancer models as platforms for obtaining genomic predictors of human cancer clinical outcomes

    Get PDF
    Mutations in the TP53 gene are very common in human cancers, and are associated with poor clinical outcome. Transgenic mouse models lacking the Trp53 gene or that express mutant Trp53 transgenes produce tumours with malignant features in many organs. We previously showed the transcriptome of a p53-deficient mouse skin carcinoma model to be similar to those of human cancers with TP53 mutations and associated with poor clinical outcomes. This report shows that much of the 682-gene signature of this murine skin carcinoma transcriptome is also present in breast and lung cancer mouse models in which p53 is inhibited. Further, we report validated gene-expression-based tests for predicting the clinical outcome of human breast and lung adenocarcinoma. It was found that human patients with cancer could be stratified based on the similarity of their transcriptome with the mouse skin carcinoma 682-gene signature. The results also provide new targets for the treatment of p53-defective tumours

    Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree

    Full text link
    In biological experiments researchers often have information in the form of a graph that supplements observed numerical data. Incorporating the knowledge contained in these graphs into an analysis of the numerical data is an important and nontrivial task. We look at the example of metagenomic data---data from a genomic survey of the abundance of different species of bacteria in a sample. Here, the graph of interest is a phylogenetic tree depicting the interspecies relationships among the bacteria species. We illustrate that analysis of the data in a nonstandard inner-product space effectively uses this additional graphical information and produces more meaningful results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS402 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Weighted-Lasso for Structured Network Inference from Time Course Data

    Full text link
    We present a weighted-Lasso method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own a prior internal structure of connectivity which drives the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structure-based penalization both on synthetic data and on two canonical regulatory networks, first yeast cell cycle regulation network by analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network by analysing U. Alon's lab data
    • …
    corecore