2,673 research outputs found

    Information visualization for DNA microarray data analysis: A critical review

    Get PDF
    Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

    Visualizing Gene Clusters using Neighborhood Graphs in R

    Get PDF
    The visualization of cluster solutions in gene expression data analysis gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Neighborhood graphs allow for visual assessment of relationships between adjacent clusters. The number of clusters in gene expression data is for biological reasons rather large. As a linear projection of the data into 2 dimensions does not scale well in the number of clusters there is a need for new visualization techniques using non-linear arrangement of the clusters. The new visualization tool is implemented in the open source statistical computing environment R. It is demonstrated on microarray data from yeast

    Expression cartography of human tissues using self organizing maps

    Get PDF
    Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.

    Using emergent clustering methods to analyse short time series gene expression data from childhood leukemia treated with glucocorticoids

    Get PDF
    Acute lymphoblastic leukemia (ALL) causes the highest number of deaths from cancer in children aged between one and fourteen. The most common treatment for children with ALL is chemotherapy, a cancer treatment that uses drugs to kill cancer cells or stop cell division. The drug and dosage combinations may vary for each child. Unfortunately, chemotherapy treatments may cause serious side effects. Glucocorticoids (GCs) have been used as therapeutic agents for children with ALL for more than 50 years. Common and widely drugs in this class include prednisolone and dexamethasone. Childhood leukemia now has a survival rate of 80% (Pui, Robison, & Look, 2008). The key clinical question is identifying those children who will not respond well to established therapy strategies.GCs regulate diverse biological processes, for example, metabolism, development, differentiation, cell survival and immunity. GCs induce apoptosis and G1 cell cycle arrest in lymphoid cells. In fact, not much is known about the molecular mechanism of GCs sensitivity and resistance, and GCs-induced apoptotic signal transduction pathways and there are many controversial hypotheses about both genes regulated by GCs and potential molecular mechanism of GCs-induced apoptosis. Therefore, understanding the mechanism of this drug should lead to better prognostic factors (treatment response), more targeted therapies and prevention of side effects. GCs induced apoptosis have been studied by using microarray technology in vivo and in vitro on samples consisting of GCs treated ALL cell lines, mouse thymocytes and/or ALL patients. However, time series GCs treated childhood ALL datasets are currently extremely limited. DNA microarrays are essential tools for analysis of expression of many genes simultaneously. Gene expression data show the level of activity of several genes under experimental conditions. Genes with similar expression patterns could belong to the same pathway or have similar function. DNA microarray data analysis has been carried out using statistical analysis as well as machine learning and data mining approaches. There are many microarray analysis tools; this study aims to combine emergent clustering methods to get meaningful biological insights into mechanisms underlying GCs induced apoptosis. In this study, microarray data originated from prednisolone (glucocorticoids) treated childhood ALL samples (Schmidt et al., 2006) (B-linage and T-linage) and collected at 6 and 24 hours after treatment are analysed using four methods: Selforganizing maps (SOMs), Emergent self-organizing maps (ESOM) (Ultsch & Morchen, 2005), the Short Time series Expression Miner (STEM) (Ernst & Bar-Joseph, 2006) and Fuzzy clustering by Local Approximation of MEmbership (FLAME) (Fu & Medico, 2007). The results revealed intrinsic biological patterns underlying the GCs time series data: there are at least five different gene activities happening during the three time points; GCs-induced apoptotic genes were identified; and genes active at both time points or only at 6 hours or 24 hours were determined. Also, interesting gene clusters with membership in already known pathways were found thereby providing promising candidate gens for further inferring GCs induced apoptotic gene regulatory networks

    Adaptive Double Self-Organizing Map for Clustering Gene Expression Data

    Get PDF
    This thesis presents a novel clustering technique known as adaptive double self- organizing map (ADSOM) that addresses the issue of identifying the correct number of clusters. ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM combines features of the popular self-organizing map with two- dimensional position vectors, which serve as a visualization tool to decide the number of clusters. It updates its free parameters during training and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, mouse, and bacteria

    Expression cartography of human tissues using self organizing maps

    Get PDF
    Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.
