42 research outputs found

    Large-scale benchmark of Endeavour using MetaCore maps

    Get PDF
    Summary: Endeavour is a tool that detects the most promising genes within large lists of candidates with respect to a biological process of interest and by combining several genomic data sources. We have benchmarked Endeavour using 450 pathway maps and 826 disease marker sets from MetaCoreTM of GeneGo, Inc. containing a total of 9911 and 12 432 genes, respectively. We obtained an area under the receiver operating characteristic curves of 0.97 for pathway and of 0.91 for disease gene sets. These results indicate that Endeavour can be used to efficiently prioritize candidate genes for pathways and diseases. Availability: Endeavour is available at http://www.esat.kuleuven.be/endeavour Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Using Ribosomal Protein Genes as Reference: A Tale of Caution

    Get PDF
    Background: Housekeeping genes are needed in every tissue as their expression is required for survival, integrity or duplication of every cell. Housekeeping genes commonly have been used as reference genes to normalize gene expression data, the underlying assumption being that they are expressed in every cell type at approximately the same level. Often, the terms "reference genes'' and "housekeeping genes'' are used interchangeably. In this paper, we would like to distinguish between these terms. Consensus is growing that housekeeping genes which have traditionally been used to normalize gene expression data are not good reference genes. Recently, ribosomal protein genes have been suggested as reference genes based on a meta-analysis of publicly available microarray data. Methodology/Principal Findings: We have applied several statistical tools on a dataset of 70 microarrays representing 22 different tissues, to assess and visualize expression stability of ribosomal protein genes. We confirmed the housekeeping status of these genes, but further estimated expression stability across tissues in order to assess their potential as reference genes. One- and two-way ANOVA revealed that all ribosomal protein genes have significant expression variation across tissues and exhibit tissue-dependent expression behavior as a group. Via multidimensional unfolding analysis, we visualized this tissue-dependency. In addition, we explored mechanisms that may cause tissue dependent effects of individual ribosomal protein genes. Conclusions/Significance: Here we provide statistical and biological evidence that ribosomal protein genes exhibit important tissue-dependent variation in mRNA expression. Though these genes are most stably expressed of all investigated genes in a meta-analysis they cannot be considered true reference genes

    Network Analysis of Differential Expression for the Identification of Disease-Causing Genes

    Get PDF
    Genetic studies (in particular linkage and association studies) identify chromosomal regions involved in a disease or phenotype of interest, but those regions often contain many candidate genes, only a few of which can be followed-up for biological validation. Recently, computational methods to identify (prioritize) the most promising candidates within a region have been proposed, but they are usually not applicable to cases where little is known about the phenotype (no or few confirmed disease genes, fragmentary understanding of the biological cascades involved). We seek to overcome this limitation by replacing knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors. We define a notion of soft neighborhood where each gene is given a contributing weight, which decreases with the distance from the candidate gene on the protein network. To account for multiple paths between genes, we define the distance using the Laplacian exponential diffusion kernel. We score candidates by aggregating the differential expression of neighbors weighted as a function of distance. Through a randomization procedure, we rank candidates by p-values. We illustrate our approach on four monogenic diseases and successfully prioritize the known disease causing genes

    An Expanded Evaluation of Protein Function Prediction Methods Shows an Improvement In Accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent. Keywords: Protein function prediction, Disease gene prioritizationpublishedVersio

    Gene Prioritization Through Genomic Data Fusion: Methods and Applications in Human Genetics (Gen prioritizatie via genomische data fusie: Methodes en toepassingen in menselijke genetica)

    No full text
    Unravelling the molecular basis underlying genetic disorders is crucial in order to develop effective treatments to tackle these diseases. For many years, scientists have explored which genetic factors were associated with several human traits and diseases. After the completion of the human genome project, several high-throughput technologies have been designed and widely used, therefore producing large amounts of genomic data. At the same time, computational tools have been developed and used in conjunction with wet-lab tools to analyze this data in order to enrich our knowledge of genetics and biology.The main focus of this thesis is gene prioritization, that can be defined as the identification of the most promising genes among a list of candidate genes with respect to a biological process of interest. It is a problem for which large quantities of data have to be manipulated, which typically means that it has to be done in silico. This thesis describes two gene prioritization methods from their theoretical development to their applications to real biological questions.The first part of this thesis describes the development of two data fusion algorithms for gene prioritization respectively based on order statistics and kernel methods. These algorithms have been developed for human and also for reference organisms. Ultimately, a cross-species version of these algorithms have been developed and implemented. Integrating genomic data among closely related organisms is relevant since many researchers are studying human indirectly through the study of reference organisms such as mouse or rat, and are therefore producing mouse/rat specific data, that is still relevant in human biology. Our method can integrate more than 20 distinct genomic data sources for five organisms and is therefore one of the first cross-species gene prioritization method of that scale.Only a fragment of all the computational tools developed each year specifically for biology are still maintained after three years, and even less are used by independent researchers. The second part of this thesis focuses on the benchmarks of the proposed methods, the development of the corresponding web based softwares, and on their application to real biological questions. By making our methods publicly available, we make sure that interested users can apply them for their own problems. In addition, benchmarking is needed to prove that the approach is theoretically valid and can estimate how accurate are the predictions. Ultimately, the inclusion of our computational method within wet-lab workflows show the real usefulness of the approach.nrpages: 215status: publishe

    Systems level analysis of sex-dependent gene expression changes in Parkinson’s disease

    No full text
    Abstract Parkinson’s disease (PD) is a heterogeneous disorder, and among the factors which influence the symptom profile, biological sex has been reported to play a significant role. While males have a higher age-adjusted disease incidence and are more frequently affected by muscle rigidity, females present more often with disabling tremors. The molecular mechanisms involved in these differences are still largely unknown, and an improved understanding of the relevant factors may open new avenues for pharmacological disease modification. To help address this challenge, we conducted a meta-analysis of disease-associated molecular sex differences in brain transcriptomics data from case/control studies. Both sex-specific (alteration in only one sex) and sex-dimorphic changes (changes in both sexes, but with opposite direction) were identified. Using further systems level pathway and network analyses, coordinated sex-related alterations were studied. These analyses revealed significant disease-associated sex differences in mitochondrial pathways and highlight specific regulatory factors whose activity changes can explain downstream network alterations, propagated through gene regulatory cascades. Single-cell expression data analyses confirmed the main pathway-level changes observed in bulk transcriptomics data. Overall, our analyses revealed significant sex disparities in PD-associated transcriptomic changes, resulting in coordinated modulations of molecular processes. Among the regulatory factors involved, NR4A2 has already been reported to harbor rare mutations in familial PD and its pharmacological activation confers neuroprotective effects in toxin-induced models of Parkinsonism. Our observations suggest that NR4A2 may warrant further research as a potential adjuvant therapeutic target to address a subset of pathological molecular features of PD that display sex-associated profiles

    Large-scale benchmark of Endeavour using MetaCore maps

    No full text
    1. Endeavour is a tool that detects the most promising genes within large lists of candidates with respect to a biological process of interest and by combining several genomic data sources. 2. We have benchmarked Endeavour using 454 pathway maps and 833 disease marker sets from MetaCoreTM of GeneGo, Inc containing a total of 10,053 and 12,699 genes respectively. We obtain an AUC of 0.97 for pathway and of 0.77 for disease gene sets. 3. The results indicate that Endeavour can be used to efficiently prioritize candidate genes for pathways and diseases
    corecore