11 research outputs found

    Conserved co-expression for candidate disease gene prioritization

    Get PDF
    Contains fulltext : 71114.pdf ( ) (Open Access)BACKGROUND: Genes that are co-expressed tend to be involved in the same biological process. However, co-expression is not a very reliable predictor of functional links between genes. The evolutionary conservation of co-expression between species can be used to predict protein function more reliably than co-expression in a single species. Here we examine whether co-expression across multiple species is also a better prioritizer of disease genes than is co-expression between human genes alone. RESULTS: We use co-expression data from yeast (S. cerevisiae), nematode worm (C. elegans), fruit fly (D. melanogaster), mouse and human and find that the use of evolutionary conservation can indeed improve the predictive value of co-expression. The effect that genes causing the same disease have higher co-expression than do other genes from their associated disease loci, is significantly enhanced when co-expression data are combined across evolutionarily distant species. We also find that performance can vary significantly depending on the co-expression datasets used, and just using more data does not necessarily lead to better prioritization. Instead, we find that dataset quality is more important than quantity, and using a consistent microarray platform per species leads to better performance than using more inclusive datasets pooled from various platforms. CONCLUSION: We find that evolutionarily conserved gene co-expression prioritizes disease candidate genes better than human gene co-expression alone, and provide the integrated data as a new resource for disease gene prioritization tools.13 p

    COXPRESdb: a database to compare gene coexpression in seven model animals

    Get PDF
    Publicly available databases of coexpressed gene sets are a valuable resource for a wide variety of experimental studies, including gene targeting for functional identification, and for investigations of regulatory mechanisms or protein–protein interaction networks. Although coexpressed gene databases are becoming more and more popular in the field of plant biology, those with animal data are rather limited, possibly due to the lower reliability of the coexpression data. The original COXPRESdb (coexpressed gene database) (http://coxpresdb.jp) represented the coexpression relationship for human and mouse. Here, we report updates of this database that especially focus on the enhancement of the reliability of gene coexpression data in animals. For this purpose, we implemented a new comparable coexpression measure, Mutual Rank, included five other animal species, rat, chicken, zebrafish, fly and nematoda, to assess the conservation of coexpression, and added different layers of omics data into the integrated network of genes. Comparison of coexpression is a key concept to enhance the reliability of gene coexpression, and the integration of different information can reduce the noise inherent in the information. With the functions for gene network representation, COXPRESdb can help researchers to clarify the functional and regulatory networks of genes in a broad array of animal species

    Modeling and analysis of RNA-seq data: a review from a statistical perspective

    Full text link
    Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. Conclusion: The development of statistical and computational methods for analyzing RNA- seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development

    WWOX at the crossroads of cancer, metabolic syndrome related traits and CNS pathologies

    Get PDF
    WWOX was cloned as a putative tumor suppressor gene mapping to chromosomal fragile site FRA16D. Deletions affecting WWOX accompanied by loss of expression are frequent in various epithelial cancers. Translocations and deletions affecting WWOX are also common in multiple myeloma and are associated with worse prognosis. Metanalysis of gene expression datasets demonstrates that low WWOX expression is significantly associated with shorter relapse-free survival in ovarian and breast cancer patients. Although somatic mutations affecting WWOX are not frequent, analysis of TCGA tumor datasets led to identifying 44 novel mutations in various tumor types. The highest frequencies of mutations were found in head and neck cancers and uterine and gastric adenocarcinomas. Mouse models of gene ablation led us to conclude that Wwox does not behave as a highly penetrant, classical tumor suppressor gene since its deletion is not tumorigenic in most models and its role is more likely to be of relevance in tumor progression rather than in initiation. Analysis of signaling pathways associated with WWOX expression confirmed previous in vivo and in vitro observations linking WWOX function with the TGFβ/SMAD and WNT signaling pathways and with specific metabolic processes. Supporting these conclusions recently we demonstrated that indeed WWOX behaves as a modulator of TGFβ/SMAD signaling by binding and sequestering SMAD3 in the cytoplasmic compartment. As a consequence progressive loss of WWOX expression in advanced breast cancer would contribute to the pro-metastatic effects resulting from TGFβ/SMAD3 hyperactive signaling in breast cancer.Recently, GWAS and resequencing studies have linked the WWOX locus with familial dyslipidemias and metabolic syndrome related traits. Indeed, gene expression studies in liver conditional KO mice confirmed an association between WWOX expression and lipid metabolism.Finally, very recently the first human pedigrees with probands carrying homozygous germline loss of function WWOX mutations have been identified. These patients are characterized by severe CNS related pathology that includes epilepsy, ataxia and mental retardation. In summary, WWOX is a highly conserved and tightly regulated gene throughout evolution and when defective or deregulated the consequences are important and deleterious as demonstrated by its association not only with poor prognosis in cancer but also with other important human pathologies such as metabolic syndrome and CNS related pathologic conditions.Centro de Investigaciones Inmunológicas Básicas y Aplicada

    Meta-coexpression conservation analysis of microarray data: a "subset" approach provides insight into brain-derived neurotrophic factor regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alterations in brain-derived neurotrophic factor (<it>BDNF</it>) gene expression contribute to serious pathologies such as depression, epilepsy, cancer, Alzheimer's, Huntington and Parkinson's disease. Therefore, exploring the mechanisms of <it>BDNF </it>regulation represents a great clinical importance. Studying <it>BDNF </it>expression remains difficult due to its multiple neural activity-dependent and tissue-specific promoters. Thus, microarray data could provide insight into the regulation of this complex gene. Conventional microarray co-expression analysis is usually carried out by merging the datasets or by confirming the re-occurrence of significant correlations across datasets. However, co-expression patterns can be different under various conditions that are represented by subsets in a dataset. Therefore, assessing co-expression by measuring correlation coefficient across merged samples of a dataset or by merging datasets might not capture all correlation patterns.</p> <p>Results</p> <p>In our study, we performed meta-coexpression analysis of publicly available microarray data using <it>BDNF </it>as a "guide-gene" introducing a "subset" approach. The key steps of the analysis included: dividing datasets into subsets with biologically meaningful sample content (e.g. tissue, gender or disease state subsets); analyzing co-expression with the <it>BDNF </it>gene in each subset separately; and confirming co- expression links across subsets. Finally, we analyzed conservation in co-expression with <it>BDNF </it>between human, mouse and rat, and sought for conserved over-represented TFBSs in <it>BDNF </it>and BDNF-correlated genes. Correlated genes discovered in this study regulate nervous system development, and are associated with various types of cancer and neurological disorders. Also, several transcription factor identified here have been reported to regulate <it>BDNF </it>expression <it>in vitro </it>and <it>in vivo</it>.</p> <p>Conclusion</p> <p>The study demonstrates the potential of the "subset" approach in co-expression conservation analysis for studying the regulation of single genes and proposes novel regulators of <it>BDNF </it>gene expression.</p

    Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network

    Get PDF
    An evidence-weighted functional-linkage network of human genes reveals associations among diseases that share no known disease genes and have dissimilar phenotype

    WWOX at the crossroads of cancer, metabolic syndrome related traits and CNS pathologies

    Get PDF
    WWOX was cloned as a putative tumor suppressor gene mapping to chromosomal fragile site FRA16D. Deletions affecting WWOX accompanied by loss of expression are frequent in various epithelial cancers. Translocations and deletions affecting WWOX are also common in multiple myeloma and are associated with worse prognosis. Metanalysis of gene expression datasets demonstrates that low WWOX expression is significantly associated with shorter relapse-free survival in ovarian and breast cancer patients. Although somatic mutations affecting WWOX are not frequent, analysis of TCGA tumor datasets led to identifying 44 novel mutations in various tumor types. The highest frequencies of mutations were found in head and neck cancers and uterine and gastric adenocarcinomas. Mouse models of gene ablation led us to conclude that Wwox does not behave as a highly penetrant, classical tumor suppressor gene since its deletion is not tumorigenic in most models and its role is more likely to be of relevance in tumor progression rather than in initiation. Analysis of signaling pathways associated with WWOX expression confirmed previous in vivo and in vitro observations linking WWOX function with the TGFβ/SMAD and WNT signaling pathways and with specific metabolic processes. Supporting these conclusions recently we demonstrated that indeed WWOX behaves as a modulator of TGFβ/SMAD signaling by binding and sequestering SMAD3 in the cytoplasmic compartment. As a consequence progressive loss of WWOX expression in advanced breast cancer would contribute to the pro-metastatic effects resulting from TGFβ/SMAD3 hyperactive signaling in breast cancer.Recently, GWAS and resequencing studies have linked the WWOX locus with familial dyslipidemias and metabolic syndrome related traits. Indeed, gene expression studies in liver conditional KO mice confirmed an association between WWOX expression and lipid metabolism.Finally, very recently the first human pedigrees with probands carrying homozygous germline loss of function WWOX mutations have been identified. These patients are characterized by severe CNS related pathology that includes epilepsy, ataxia and mental retardation. In summary, WWOX is a highly conserved and tightly regulated gene throughout evolution and when defective or deregulated the consequences are important and deleterious as demonstrated by its association not only with poor prognosis in cancer but also with other important human pathologies such as metabolic syndrome and CNS related pathologic conditions.Centro de Investigaciones Inmunológicas Básicas y Aplicada

    Decoding heterogeneous big data in an integrative way

    Get PDF
    Biotechnologies in post-genomic era, especially those that generate data in high-throughput, bring opportunities and challenges that are never faced before. And one of them is how to decode big heterogeneous data for clues that are useful for biological questions. With the exponential growth of a variety of data, comes with more and more applications of systematic approaches that investigate biological questions in an integrative way. Systematic approaches inherently require integration of heterogeneous information, which is urgently calling for a lot more efforts. In this thesis, the effort is mainly devoted to the development of methods and tools that help to integrate big heterogeneous information. In Chapter 2, we employed a heuristic strategy to summarize/integrate genes that are essential for the determination of mouse retinal cells in the format of network. These networks with experimental evidence could be rediscovered in the analysis of high-throughput data set and thus would be useful in the leverage of high-throughput data. In Chapter 3, we described EnRICH, a tool that we developed to help qualitatively integrate heterogeneous intro-organism information. We also introduced how EnRICH could be applied to the construction of a composite network from different sources, and demonstrated how we used EnRICH to successfully prioritize retinal disease genes. Following the work of Chapter 3 (intro-organism information integration), in Chapter 4 we stepped to the development of method and tool that can help deal with inter-organism information integration. The method we proposed is able to match genes in a one-to-one fashion between any two genomes. In summary, this thesis contributes to integrative analysis of big heterogeneous data by its work on the integration of intro- and inter-organism information

    Incorporation of Knowledge for Network-based Candidate Gene Prioritization

    Get PDF
    In order to identify the genes associated with a given disease, a number of different high-throughput techniques are available such as gene expression profiles. However, these high-throughput approaches often result in hundreds of different candidate genes, and it is thus very difficult for biomedical researchers to narrow their focus to a few candidate genes when studying a given disease. In order to assist in this challenge, a process called gene prioritization can be utilized. Gene prioritization is the process of identifying and ranking new genes as being associated with a given disease. Candidate genes which rank high are deemed more likely to be associated with the disease than those that rank low. This dissertation focuses on a specific kind of gene prioritization method called network-based gene prioritization. Network-based methods utilize a biological network such as a protein-protein interaction network to rank the candidate genes. In a biological network, a node represents a protein (or gene), and a link represents a biological relationship between two proteins such as a physical interaction. The purpose of this dissertation was to investigate if the incorporation of biological knowledge into the network-based gene prioritization process can provide a significant benefit. The biological knowledge consisted of a variety of information about a given gene including gene ontology (GO) functional terms, MEDLINE articles, gene co-expression measurements, and protein domains to name just a few. The biological knowledge was incorporated into the network’s links and nodes as link and node knowledge respectively. An example of link knowledge is the degree of functional similarity between two proteins, and an example of node knowledge is the number of GO terms associated with a given protein. Since there were no existing network-based inference algorithms which could incorporate node knowledge, I developed a new network-based inference algorithm to incorporate both link and node knowledge called the Knowledge Network Gene Prioritization (KNGP) algorithm. The results showed that the incorporation of biological knowledge via link and node knowledge can provide a significant benefit for network-based gene prioritization. The KNGP algorithm was utilized to combine the link and node knowledge
    corecore