14 research outputs found

    Extracting biologically significant patterns from short time series gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.</p> <p>Results</p> <p>We developed two new algorithms that are capable of extracting biological patterns from short time point series gene expression data. The two algorithms, <it>ASTRO </it>and <it>MiMeSR</it>, are inspired by the <it>rank order preserving </it>framework and the <it>minimum mean squared residue </it>approach, respectively. However, <it>ASTRO </it>and <it>MiMeSR </it>differ from previous approaches in that they take advantage of the relatively few number of time points in order to reduce the problem from NP-hard to linear. Tested on well-defined short time expression data, we found that our approaches are robust to noise, as well as to random patterns, and that they can correctly detect the temporal expression profile of relevant functional categories. Evaluation of our methods was performed using Gene Ontology (GO) annotations and chromatin immunoprecipitation (ChIP-chip) data.</p> <p>Conclusion</p> <p>Our approaches generally outperform both standard clustering algorithms and algorithms designed specifically for clustering of short time series gene expression data. Both algorithms are available at <url>http://www.benoslab.pitt.edu/astro/</url>.</p

    Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.</p> <p>Results</p> <p>We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (<it>Plasmodium chabaudi</it>), systemic acquired resistance in <it>Arabidopsis thaliana</it>, similarities and differences between inner and outer cotyledon in <it>Brassica napus </it>during seed development, and to <it>Brassica napus </it>whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.</p> <p>Conclusions</p> <p>Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.</p

    DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

    No full text
    Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.</p

    An enrichment model for wheat gene annotations using phylogeny, orthology and existing gene ontologies in 9 plant species

    No full text
    Genome sequencing efforts for the Triticum aestivum genome produce massive amounts of contigs, preliminary assemblies and putative genes/proteins, nevertheless their annotation is still in its infancy. Given the much larger percentage of annotated genes in other previously sequenced plant genomes such as Arabidopsis thaliana and Oryza sativa and the known phylogenetic and orthology relationship among these plant species and their corresponding genes, we propose an enrichment model that will further expand the horizon of wheat gene annotations. Our sequences and annotations base includes data from Ensembl Plants for 9 plant species: Aegilops tauschii, Arabidopsis thaliana, Brachypodium distachyon, Brassica rapa, Hordeum vulgare, Oryza sativa subsp. japonica, Sorghum bicolor, Triticum urartu and Zea mays. Orthology relationships between wheat genes and each of the 9 plant species are predicted using an in-house software package. Next, ortholog cliques are identified such that each set of genes within a clique represents pairwise orthologs. Using the phylogenetic distances between wheat and each plant species to quantify the level of confidence for gene ontology assignments within each ortholog clique, new gene annotations are assigned to wheat genes such that either novel or more specific GO terms are associated with those genes. Overall, based on clique size equal or larger than 3, our model enriched the existing gene-GO term associations for 7,838 (8%) wheat genes, of which 2,139 had no previous annotation. For the particular case of ortholog cliques of size 10 (13 in total) where all 10 genes within a clique are tightly connected via pairwise orthology, 85 new and more specific GO terms were identified, which represent a 65% increase compared with the previously 130 known GO terms. These observations are further supported for 4 out of the 10 plant species considered in this work by experimental evidence using expressologs (Patel et al., Plant J. 2012).Peer reviewed: YesNRC publication: Ye

    GOAL : A software tool for assessing biological significance of genes groups

    No full text
    Background: Modern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each group is usually characterized by utilizing available biological knowledge in public databases such as Gene Ontology (GO), KEGG pathways, association between a transcription factor (TF) and its target genes, and/or gene networks. Results: We developed GOAL: Gene Ontology AnaLyzer, a software tool specifically designed for the functional evaluation of gene groups. GOAL implements and supports efficient and statistically rigorous functional interpretations of gene groups through its integration with available GO, TF-gene association data, and association with KEGG pathways. In order to facilitate more specific functional characterization of a gene group, we implement three GO-tree search strategies rather than one as in most existing GO analysis tools. Furthermore, GOAL offers flexibility in deployment. It can be used as a standalone tool, a plug-in to other computational biology tools, or a web server application. Conclusion: We developed a functional evaluation software tool, GOAL, to perform functional characterization of a gene group. GOAL offers three GO-tree search strategies and combines its strength in function integration, portability and visualization, and its flexibility in deployment. Furthermore, GOAL can be used to evaluate and compare gene groups as the output from computational biology tools such as clustering algorithms.Des techniques exp\ue9rimentales modernes \ue0 haut d\ue9bit comme les puces \ue0 ADN donnent souvent lieu \ue0 de longues listes de g\ue8nes. On a alors recours \ue0 des outils bio-informatiques comme le regroupement (clustering) pour classer les g\ue8nes en fonction de leur similitude dans leur profil d\u2019expression. Les g\ue8nes de chaque groupe sont probablement li\ue9s de mani\ue8re fonctionnelle. La caract\ue9risation de la pertinence fonctionnelle parmi les g\ue8nes de chaque groupe s\u2019effectue habituellement en utilisant soit les connaissances biologiques accessibles dans des bases de donn\ue9es publiques comme la Gene Ontology (GO), soit la KEGG PATHWAY, soit l\u2019association entre un facteur de transcription et ses g\ue8nes cibles, ou les r\ue9seaux de g\ue8nes.Peer reviewed: YesNRC publication: Ye

    Towards the reconstruction of Brassica napus seed development FA metabolism dynamic regulatory map

    No full text
    The increasing demand for canola (Brassica napus) for both food (e.g. vegetable oil) and non-food (e.g. biofuel) applications presents significant socio-economic benefits. While genetic engineering offers great potential to speed up the process of canola improvement, such an effort relies on a good understanding of the molecular mechanisms underlying seed development, fatty acid (FA) metabolism, and oil content. Applying a well-defined algorithm to a time-series gene expression dataset of B. napus during seed development, and a well selected dataset of interactions between transcription factors and their target genes, we derive a dynamic regulatory map that is able to recover many of the known aspects of these responses. Predictions made in this study are further validated through literature search, leading to potential new roles for LEC1, LEC2, WRI1, FUS3, MYB30, and ABI3 in controlling B. napus seed development and FA metabolism related genes, thus potential targets for genetic improvement of oil production.La demande croissante de canola (Brassica napus) tant pour des applications alimentaires (p. ex., huile v\ue9g\ue9tale) que pour des usages non alimentaires (p. ex., biocarburant) offre des avantages socio\ue9conomiques importants. Le g\ue9nie g\ue9n\ue9tique offre un potentiel \ue9norme pour acc\ue9l\ue9rer le processus d\u2019am\ue9lioration du canola, mais un tel travail mise sur une bonne compr\ue9hension des m\ue9canismes mol\ue9culaires qui sous-tendent le d\ue9veloppement des graines, le m\ue9tabolisme des acides gras et la teneur en huile. En appliquant un algorithme bien d\ue9fini \ue0 une s\ue9rie temporelle d\u2019un ensemble de donn\ue9es d\u2019expression g\ue9n\ue9tique de B. napus au cours du d\ue9veloppement des graines et \ue0 un ensemble soigneusement s\ue9lectionn\ue9 de donn\ue9es d\u2019interactions entre des facteurs de transcription et leurs g\ue8nes cibles, nous avons \ue9tabli une carte de r\ue9gulation dynamique qui est en mesure de r\ue9cup\ue9rer bon nombre des aspects connus de ces r\ue9ponses. Les pr\ue9visions faites au cours de cette \ue9tude ont fait l\u2019objet d\u2019une validation approfondie par une recherche documentaire, ce qui a men\ue9 \ue0 de nouveaux r\uf4les potentiels pour LEC1, LEC2, WRI1, FUS3, MYB30 et ABI3 dans la r\ue9gulation du d\ue9veloppement des graines de B. napus et des g\ue8nes li\ue9s au m\ue9tabolisme des acides gras. Ces facteurs de transcription deviennent donc des cibles potentielles sur le plan g\ue9n\ue9tique pour am\ue9liorer la production d\u2019huile.Peer reviewed: YesNRC publication: Ye
    corecore