8,047 research outputs found

    Evaluation of statistical correlation and validation methods for construction of gene co-expression networks

    Get PDF
    High-throughput technologies such as microarrays have led to the rapid accumulation of large scale genomic data providing opportunities to systematically infer gene function and co-expression networks. Typical steps of co-expression network analysis using microarray data consist of estimation of pair-wise gene co-expression using some similarity measure, construction of co-expression networks, identification of clusters of co-expressed genes and post-cluster analyses such as cluster validation. This dissertation is primarily concerned with development and evaluation of approaches for the first and the last steps – estimation of gene co-expression matrices and validation of network clusters. Since clustering methods are not a focus, only a paraclique clustering algorithm will be used in this evaluation. First, a novel Bayesian approach is presented for combining the Pearson correlation with prior biological information from Gene Ontology, yielding a biologically relevant estimate of gene co-expression. The addition of biological information by the Bayesian approach reduced noise in the paraclique gene clusters as indicated by high silhouette and increased homogeneity of clusters in terms of molecular function. Standard similarity measures including correlation coefficients from Pearson, Spearman, Kendall’s Tau, Shrinkage, Partial, and Mutual information, and Euclidean and Manhattan distance measures were evaluated. Based on quality metrics such as cluster homogeneity and stability with respect to ontological categories, clusters resulting from partial correlation and mutual information were more biologically relevant than those from any other correlation measures. Second, statistical quality of clusters was evaluated using approaches based on permutation tests and Mantel correlation to identify significant and informative clusters that capture most of the covariance in the dataset. Third, the utility of statistical contrasts was studied for classification of temporal patterns of gene expression. Specifically, polynomial and Helmert contrast analyses were shown to provide a means of labeling the co-expressed gene sets because they showed similar temporal profiles

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    RESKO: Repositioning drugs by using side effects and knowledge from ontologies

    Get PDF
    The objective of drug repositioning is to apply existing drugs to different diseases or medical conditions than the original target, and thus alleviate to a certain extent the time and cost expended in drug development. Our system RESKO, REpositioning drugs using Side Effects and Knowledge from Ontologies, identifies drugs with similar side-effects which are potential candidates for use elsewhere, the supposition is that similar side-effects may be caused by drugs targeting similar proteins and pathways. RESKO, integrates drug chemical data, protein interaction and ontological knowledge. The novel aspects of our system include a high level of biological knowledge through the use of pathway and biological ontology integration. This provides a explanation facility lacking in most of the existing methods and improves the repositioning process. We evaluate the shared side effects from the eight conventional Alzheimer drugs, from which sixty-seven candidate drugs based on a side-effect commonality were identified. The top 25 drugs on the list were further investigated in depth for their suitability to be repositioned, the literature revealed that many of the candidate drugs appear to have been trialed for Alzheimer's disease. Thus verifying the accuracy of our system, we also compare our technique with several competing systems found in the literature

    Analysis of G-quadruplexes as environmental sensors: Novel statistical models and computational algorithms enable interpretation of complex gene expression patterns for maize under salt stress conditions

    Get PDF
    The occurrence of G-quadruplex (G4) structures in both genic and non-genic sequences have been well-documented. However, even in genic regions the biological functions of these motifs remains poorly understood, though their potential to act in a regulatory fashion has been hypothesized. With the recent development of next-generation sequencing technology, we have accumulated genomic and transcriptomic sequences from various species and tissues. Coupled with pattern recognition software that can identify putative G4 sequences, the time is right for tackling the question of whether and how G4’s are involved in regulating gene expression. Previous studies suggested that G4 conformation can be dependent on cation type and concentration, along with G4 motif patterns differences (e.g., number of consecutive guanines). It also has been shown that G4 function may be associated with the location relative to a given gene’s structural elements (transcription start site [TSS], exon/intron boundaries, etc.). My project focused on the expression of G4-containing genes from maize tissues under various abiotic stress conditions, including salt stress, which would be likely to change physiological cation concentrations. I quantified, compared, and visualized expression of G4-containing gene groups by developing and applying novel computational algorithms and statistical models. These methods were packaged into a software program I released on a web server called C-REx (http://c-rex.dill-picl.org/). I found that under salt stress conditions, transcription factors (TFs) with a G4 on the anti-sense strand upstream of the TSS are 455% more likely to be up-regulated than non-G4 genes. Likewise, transcription factors with a G4 on the anti-sense strand just downstream of the TSS are 259% more likely to be up-regulated. In addition, among G4 transcription factors that are up-regulated, heat shock factors are significantly enriched. On the other hand, under salt stress conditions non-TF genes with a G4 on anti-sense strand upstream of the TSS are 157% more likely to be down-regulated, and those with the G4 on the anti-sense strand downstream of the TSS are 124% more likely to be down-regulated. Through G4 sequence feature analysis, we found that the length of G-runs was significantly associated with whether genes were switched ‘on’ or ‘off’ in salt stress conditions. The shortest G-runs were associated with G4 motifs in TF genes that were switched ‘on’ and longest G-runs were associated with G4s in non-TF genes that were switched ‘off’. These findings suggest that salt stress resilience could potentially be improved in maize by selecting for natural gene variants with specific G4 constitutions or by introducing specific G4 motifs of varying lengths into TF and non-TF genes involved in response to salt stress

    Analysis of transcriptomic differences between NK603 maize and near-isogenic varieties using RNA sequencing and RT-qPCR

    Get PDF
    Background The insertion of a transgene into a plant organism can, in addition to the intended effects, lead to unintended effects in the plants. To uncover such effects, we compared maize grains of two genetically modified varieties containing NK603 (AG8025RR2, AG9045RR2) to their non-transgenic counterparts (AG8025conv, AG9045conv) using high-throughput RNA sequencing. Moreover, in-depth analysis of these data was performed to reveal the biological meaning of detected differences. Results Uniquely mapped reads corresponded to 29,146 and 33,420 counts in the AG8025 and AG9045 varieties, respectively. An analysis using the R-Bioconductor package EdgeR revealed 3534 and 694 DEGs (significant differentially expressed genes) between the varieties AG8025RR2 and AG9045RR2, respectively, and their non-transgenic counterparts. Furthermore, a Deseq2 package revealed 2477 and 440 DEGs between AG8025RR2 and AG9045RR2, respectively, and their counterparts. We were able to confirm the RNA-seq results by the analysis of two randomly selected genes using RT-qPCR (reverse transcription quantitative PCR). PCA and heatmap analysis confirmed a robust data set that differentiates the genotypes even by transgenic event. A detailed analysis of the DEGs was performed by the functional annotation of GO (Gene Ontology), annotation/enrichment analysis of KEGG (Kyoto Encyclopedia of Genes and Genomes) ontologies and functional classification of resulting key genes using the DAVID Bioinformatics Package. Several biological processes and metabolic pathways were found to be significantly different in both variety pairs. Conclusion Overall, our data clearly demonstrate substantial differences between the analyzed transgenic varieties and their non-transgenic counterparts. These differences indicate that several unintended effects have occurred as a result of NK603 integration. Heatmap data imply that most of the transgenic insert effects are variety-dependent. However, identified key genes involved in affected pathways of both variety pairs show that transgenic independent effects cannot be excluded. Further research of different NK603 varieties is necessary to clarify the role of internal and external influences on gene expression. Nevertheless, our study suggests that RNA-seq analysis can be utilized as a tool to characterize unintended genetic effects in transgenic plants and may also be useful in the safety assessment and authorization of genetically modified (GM) plants.publishedVersio

    Selection for improved energy use efficiency and drought tolerance in canola results in distinct transcriptome and epigenome changes

    Get PDF
    To increase both the yield potential and stability of crops, integrated breeding strategies are used that have mostly a direct genetic basis, but the utility of epigenetics to improve complex traits is unclear. A better understanding of the status of the epigenome and its contribution to agronomic performance would help in developing approaches to incorporate the epigenetic component of complex traits into breeding programs. Starting from isogenic canola (Brassica napus) lines, epilines were generated by selecting, repeatedly for three generations, for increased energy use efficiency and drought tolerance. These epilines had an enhanced energy use efficiency, drought tolerance, and nitrogen use efficiency. Transcriptome analysis of the epilines and a line selected for its energy use efficiency solely revealed common differentially expressed genes related to the onset of stress tolerance-regulating signaling events. Genes related to responses to salt, osmotic, abscisic acid, and drought treatments were specifically differentially expressed in the drought-tolerant epilines. The status of the epigenome, scored as differential trimethylation of lysine-4 of histone 3, further supported the phenotype by targeting drought-responsive genes and facilitating the transcription of the differentially expressed genes. From these results, we conclude that the canola epigenome can be shaped by selection to increase energy use efficiency and stress tolerance. Hence, these findings warrant the further development of strategies to incorporate epigenetics into breeding

    A complex network approach reveals pivotal sub-structure of genes linked to Schizophrenia

    Get PDF
    Research on brain disorders with a strong genetic component and complex heritability, like schizophrenia and autism, has promoted the development of brain transcriptomics. This research field deals with the deep understanding of how gene-gene interactions impact on risk for heritable brain disorders. With this perspective, we developed a novel data-driven strategy for characterizing genetic modules, i.e., clusters, also called community, of strongly interacting genes. The aim is to uncover a pivotal module of genes by gaining biological insight upon them. Our approach combined network topological properties, to highlight the presence of a pivotal community, matchted with information theory, to assess the informativeness of partitions. Shannon entropy of the complex networks based on average betweenness of the nodes is adopted for this purpose. We analyzed the publicly available BrainCloud dataset, containing post-mortem gene expression data and we focused on the Dopamine Receptor D2, encoded by the DRD2 gene. To parse the DRD2 community into sub-structure, we applied and compared four different community detection algorithms. A pivotal DRD2 module emerged for all procedures applied and it represented a considerable reduction, compared with the beginning network size. Dice index 80% for the detected community confirmed the stability of the results, in a wide range of tested parameters. The detected community was also the most informative, as it represented an optimization of the Shannon entropy. Lastly, we verified that the DRD2 was strongly connected to its neighborhood, stronger than any other randomly selected community and more than the Weighted Gene Coexpression Network Analysis (WGCNA) module, commonly considered the standard approach for these studies

    Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

    Get PDF
    Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterised. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional subclassification of CATH superfamilies. The superfamilies are subclassified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer

    High-resolution temporal profiling of transcripts during Arabidopsis leaf senescence reveals a distinct chronology of processes and regulation

    Get PDF
    Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little information on how these function in the global control of the process. We used microarray analysis to obtain a highresolution time-course profile of gene expression during development of a single leaf over a 3-week period to senescence. A complex experimental design approach and a combination of methods were used to extract high-quality replicated data and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic processes, signaling pathways, and specific TF activity, which will underpin the development of network models to elucidate the process of senescence

    Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust.</p> <p>Results</p> <p>We apply our proposed iterative algorithm to three sets of experimental DNA microarray data from experiments with the yeast <it>Saccharomyces cerevisiae </it>and show that the proposed iterative approach improves biological coherence. Comparison with other clustering techniques suggests that our iterative algorithm provides superior performance with regard to biological coherence. An important consequence of our approach is that an increasing proportion of genes find membership in clusters of high biological coherence and that the average cluster specificity improves.</p> <p>Conclusion</p> <p>The results from these clustering experiments provide a robust basis for extracting motifs and trans-acting factors that determine particular patterns of expression. In addition, the biological coherence of the clusters is iteratively assessed independently of the clustering. Thus, this method will not be severely impacted by functional annotations that are missing, inaccurate, or sparse.</p
    corecore