8,036 research outputs found

    Getting started in probabilistic graphical models

    Get PDF
    Probabilistic graphical models (PGMs) have become a popular tool for computational analysis of biological data in a variety of domains. But, what exactly are they and how do they work? How can we use PGMs to discover patterns that are biologically relevant? And to what extent can PGMs help us formulate new hypotheses that are testable at the bench? This note sketches out some answers and illustrates the main ideas behind the statistical approach to biological pattern discovery.Comment: 12 pages, 1 figur

    Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects

    Get PDF
    Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli

    Noise resistant generalized parametric validity index of clustering for gene expression data

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements

    On bicluster aggregation and its benefits for enumerative solutions

    Full text link
    Biclustering involves the simultaneous clustering of objects and their attributes, thus defining local two-way clustering models. Recently, efficient algorithms were conceived to enumerate all biclusters in real-valued datasets. In this case, the solution composes a complete set of maximal and non-redundant biclusters. However, the ability to enumerate biclusters revealed a challenging scenario: in noisy datasets, each true bicluster may become highly fragmented and with a high degree of overlapping. It prevents a direct analysis of the obtained results. To revert the fragmentation, we propose here two approaches for properly aggregating the whole set of enumerated biclusters: one based on single linkage and the other directly exploring the rate of overlapping. Both proposals were compared with each other and with the actual state-of-the-art in several experiments, and they not only significantly reduced the number of biclusters but also consistently increased the quality of the solution.Comment: 15 pages, will be published by Springer Verlag in the LNAI Series in the book Advances in Data Minin

    Expression profiles of genes regulating dairy cow fertility: recent findings, ongoing activities and future possibilities

    Get PDF
    Subfertility has negative effects for dairy farm profitability, animal welfare and sustainability of animal production. Increasing herd sizes and economic pressures restrict the amount of time that farmers can spend on counteractive management Genetic improvement will become increasingly important to restore reproductive performance. Complementary to traditional breeding value estimation procedures, genomic selection based on genome-wide information will become more widely applied. Functional genomics, including transcriptomics (gene expression profiling), produces the information to understand the consequences of selection as it helps to unravel physiological mechanisms underlying female fertility traits. Insight into the latter is needed to develop new effective management strategies to combat subfertility. Here, the importance of functional genomics for dairy cow reproduction so far and in the near future is evaluated. Recent gene profiling studies in the field of dairy cow fertility are reviewed and new data are presented on genes that are expressed in the brains of dairy cows and that are involved in dairy cow oestrus (behaviour). Fast-developing new research areas in the field of functional genomics, such as epigenetics, RNA interference, variable copy numbers and nutrigenomics are discussed including their promising future value for dairy cow fertility

    Do-it-yourself: construction of a custom cDNA macroarray platform with high sensitivity and linear range

    Get PDF
    Background: Research involving gene expression profiling and clinical applications, such as diagnostics and prognostics, often require a DNA array platform that is flexibly customisable and cost-effective, but at the same time is highly sensitive and capable of accurately and reproducibly quantifying the transcriptional expression of a vast number of genes over the whole transcriptome dynamic range using low amounts of RNA sample. Hereto, a set of easy-to-implement practical optimisations to the design of cDNA-based nylon macroarrays as well as sample (33)P-labeling, hybridisation protocols and phosphor screen image processing were analysed for macroarray performance. Results: The here proposed custom macroarray platform had an absolute sensitivity as low as 50,000 transcripts and a linear range of over 5 log-orders. Its quality of identifying differentially expressed genes was at least comparable to commercially available microchips. Interestingly, the quantitative accuracy was found to correlate significantly with corresponding reversed transcriptase - quantitative PCR values, the gold standard gene expression measure (Pearson's correlation test p < 0.0001). Furthermore, the assay has low cost and input RNA requirements (0.5 mu g and less) and has a sound reproducibility. Conclusions: Results presented here, demonstrate for the first time that self-made cDNA-based nylon macroarrays can produce highly reliable gene expression data with high sensitivity and covering the entire mammalian dynamic range of mRNA abundances. Starting off from minimal amounts of unamplified total RNA per sample, a reasonable amount of samples can be assayed simultaneously for the quantitative expression of hundreds of genes in an easily customisable and cost-effective manner

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore