457,148 research outputs found

    The Iterative Signature Algorithm for the analysis of large scale gene expression data

    Full text link
    We present a new approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, that searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of Singular Value Decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in-silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast S. cerevisiae.Comment: Latex, 36 pages, 8 figure

    Visualizing Gene Clusters using Neighborhood Graphs in R

    Get PDF
    The visualization of cluster solutions in gene expression data analysis gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Neighborhood graphs allow for visual assessment of relationships between adjacent clusters. The number of clusters in gene expression data is for biological reasons rather large. As a linear projection of the data into 2 dimensions does not scale well in the number of clusters there is a need for new visualization techniques using non-linear arrangement of the clusters. The new visualization tool is implemented in the open source statistical computing environment R. It is demonstrated on microarray data from yeast

    Identification of gene expression logical invariants in Arabidopsis.

    Get PDF
    Numerous gene expression datasets from diverse tissue samples from the plant variety Arabidopsis thaliana have been already deposited in the public domain. There have been several attempts to do large scale meta-analyses of all of these datasets. Most of these analyses summarize pairwise gene expression relationships using correlation, or identify differentially expressed genes in two conditions. We propose here a new large scale meta-analysis of the publicly available Arabidopsis datasets to identify Boolean logical relationships between genes. Boolean logic is a branch of mathematics that deals with two possible values. In the context of gene expression datasets we use qualitative high and low expression values. A strong logical relationship between genes emerges if at least one of the quadrants is sparsely populated. We pointed out serious issues in the data normalization steps widely accepted and published recently in this context. We put together a web resource where gene expression relationships can be explored online which helps visualize the logical relationships between genes. We believe that this website will be useful in identifying important genes in different biological context. The web link is http://hegemon.ucsd.edu/plant/

    Inferring causal relations from multivariate time series : a fast method for large-scale gene expression data

    Get PDF
    Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data, which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an efficient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation

    Analysis of a Gibbs sampler method for model based clustering of gene expression data

    Full text link
    Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model based clustering approaches have emerged as statistically well grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. We have extended an existing algorithm for model based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for S. cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.Comment: 8 pages, 7 figure

    Variation in the Large-Scale Organization of Gene Expression Levels in the Hippocampus Relates to Stable Epigenetic Variability in Behavior

    Get PDF
    Despite sharing the same genes, identical twins demonstrate substantial variability in behavioral traits and in their risk for disease. Epigenetic factors-DNA and chromatin modifications that affect levels of gene expression without affecting the DNA sequence-are thought to be important in establishing this variability. Epigenetically-mediated differences in the levels of gene expression that are associated with individual variability traditionally are thought to occur only in a gene-specific manner. We challenge this idea by exploring the large-scale organizational patterns of gene expression in an epigenetic model of behavioral variability.To study the effects of epigenetic influences on behavioral variability, we examine gene expression in genetically identical mice. Using a novel approach to microarray analysis, we show that variability in the large-scale organization of gene expression levels, rather than differences in the expression levels of specific genes, is associated with individual differences in behavior. Specifically, increased activity in the open field is associated with increased variance of log-transformed measures of gene expression in the hippocampus, a brain region involved in open field activity. Early life experience that increases adult activity in the open field also similarly modifies the variance of gene expression levels. The same association of the variance of gene expression levels with behavioral variability is found with levels of gene expression in the hippocampus of genetically heterogeneous outbred populations of mice, suggesting that variation in the large-scale organization of gene expression levels may also be relevant to phenotypic differences in outbred populations such as humans. We find that the increased variance in gene expression levels is attributable to an increasing separation of several large, log-normally distributed families of gene expression levels. We also show that the presence of these multiple log-normal distributions of gene expression levels is a universal characteristic of gene expression in eurkaryotes. We use data from the MicroArray Quality Control Project (MAQC) to demonstrate that our method is robust and that it reliably detects biological differences in the large-scale organization of gene expression levels.Our results contrast with the traditional belief that epigenetic effects on gene expression occur only at the level of specific genes and suggest instead that the large-scale organization of gene expression levels provides important insights into the relationship of gene expression with behavioral variability. Understanding the epigenetic, genetic, and environmental factors that regulate the large-scale organization of gene expression levels, and how changes in this large-scale organization influences brain development and behavior will be a major future challenge in the field of behavioral genomics

    Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'

    Get PDF
    We combined large-scale mRNA expression analysis and gene mapping to identify genes and loci that control hematopoietic stem cell (HSC) function. We measured mRNA expression levels in purified HSCs isolated from a panel of densely genotyped recombinant inbred mouse strains. We mapped quantitative trait loci (QTLs) associated with variation in expression of thousands of transcripts. By comparing the physical transcript position with the location of the controlling QTL, we identified polymorphic cis-acting stem cell genes. We also identified multiple trans-acting control loci that modify expression of large numbers of genes. These groups of coregulated transcripts identify pathways that specify variation in stem cells. We illustrate this concept with the identification of candidate genes involved with HSC turnover. We compared expression QTLs in HSCs and brain from the same mice and identified both shared and tissue-specific QTLs. Our data are accessible through WebQTL, a web-based interface that allows custom genetic linkage analysis and identification of coregulated transcripts.

    A microfluidic processor for gene expression profiling of single human embryonic stem cells

    Get PDF
    The gene expression of human embryonic stem cells (hESC) is a critical aspect for understanding the normal and pathological development of human cells and tissues. Current bulk gene expression assays rely on RNA extracted from cell and tissue samples with various degree of cellular heterogeneity. These cell population averaging data are difficult to interpret, especially for the purpose of understanding the regulatory relationship of genes in the earliest phases of development and differentiation of individual cells. Here, we report a microfluidic approach that can extract total mRNA from individual single-cells and synthesize cDNA on the same device with high mRNA-to-cDNA efficiency. This feature makes large-scale single-cell gene expression profiling possible. Using this microfluidic device, we measured the absolute numbers of mRNA molecules of three genes (B2M, Nodal and Fzd4) in a single hESC. Our results indicate that gene expression data measured from cDNA of a cell population is not a good representation of the expression levels in individual single cells. Within the G0/G1 phase pluripotent hESC population, some individual cells did not express all of the 3 interrogated genes in detectable levels. Consequently, the relative expression levels, which are broadly used in gene expression studies, are very different between measurements from population cDNA and single-cell cDNA. The results underscore the importance of discrete single-cell analysis, and the advantages of a microfluidic approach in stem cell gene expression studies
    corecore