49 research outputs found

    Integration of multiple data sources to prioritize candidate genes using discounted rating system

    Get PDF
    Background: Identifying disease gene from a list of candidate genes is an important task in bioinformatics. The main strategy is to prioritize candidate genes based on their similarity to known disease genes. Most of existing gene prioritization methods access only one genomic data source, which is noisy and incomplete. Thus, there is a need for the integration of multiple data sources containing different information. Results: In this paper, we proposed a combination strategy, called discounted rating system (DRS). We performed leave one out cross validation to compare it with N-dimensional order statistics (NDOS) used in Endeavour. Results showed that the AUC (Area Under the Curve) values achieved by DRS were comparable with NDOS on most of the disease families. But DRS worked much faster than NDOS, especially when the number of data sources increases. When there are 100 candidate genes and 20 data sources, DRS works more than 180 times faster than NDOS. In the framework of DRS, we give different weights for different data sources. The weighted DRS achieved significantly higher AUC values than NDOS. Conclusion: The proposed DRS algorithm is a powerful and effective framework for candidate gene prioritization. If weights of different data sources are proper given, the DRS algorithm will perform better

    Integrating Computational Biology and Forward Genetics in Drosophila

    Get PDF
    Genetic screens are powerful methods for the discovery of gene–phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of “omics” data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene–gene association discovery

    ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases.</p> <p>Results</p> <p>We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases.</p> <p>Conclusions</p> <p>ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at <url>http://cbio.ensmp.fr/prodige</url>.</p

    Candidate gene prioritization by network analysis of differential expression using machine learning approaches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.</p> <p>To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.</p> <p>Results</p> <p>We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.</p> <p>Conclusion</p> <p>In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p

    Coordinated modular functionality and prognostic potential of a heart failure biomarker-driven interaction network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of potentially relevant biomarkers and a deeper understanding of molecular mechanisms related to heart failure (HF) development can be enhanced by the implementation of biological network-based analyses. To support these efforts, here we report a global network of protein-protein interactions (PPIs) relevant to HF, which was characterized through integrative bioinformatic analyses of multiple sources of "omic" information.</p> <p>Results</p> <p>We found that the structural and functional architecture of this PPI network is highly modular. These network modules can be assigned to specialized processes, specific cellular regions and their functional roles tend to partially overlap. Our results suggest that HF biomarkers may be defined as key coordinators of intra- and inter-module communication. Putative biomarkers can, in general, be distinguished as "information traffic" mediators within this network. The top high traffic proteins are encoded by genes that are not highly differentially expressed across HF and non-HF patients. Nevertheless, we present evidence that the integration of expression patterns from high traffic genes may support accurate prediction of HF. We quantitatively demonstrate that intra- and inter-module functional activity may be controlled by a family of transcription factors known to be associated with the prevention of hypertrophy.</p> <p>Conclusion</p> <p>The systems-driven analysis reported here provides the basis for the identification of potentially novel biomarkers and understanding HF-related mechanisms in a more comprehensive and integrated way.</p

    FAM5C Contributes to Aggressive Periodontitis

    Get PDF
    Aggressive periodontitis is characterized by a rapid and severe periodontal destruction in young systemically healthy subjects. A greater prevalence is reported in Africans and African descendent groups than in Caucasians and Hispanics. We first fine mapped the interval 1q24.2 to 1q31.3 suggested as containing an aggressive periodontitis locus. Three hundred and eighty-nine subjects from 55 pedigrees were studied. Saliva samples were collected from all subjects, and DNA was extracted. Twenty-one single nucleotide polymorphisms were selected and analyzed by standard polymerase chain reaction using TaqMan chemistry. Non-parametric linkage and transmission distortion analyses were performed. Although linkage results were negative, statistically significant association between two markers, rs1935881 and rs1342913, in the FAM5C gene and aggressive periodontitis (p = 0.03) was found. Haplotype analysis showed an association between aggressive periodontitis and the haplotype A-G (rs1935881-rs1342913; p = 0.009). Sequence analysis of FAM5C coding regions did not disclose any mutations, but two variants in conserved intronic regions of FAM5C, rs57694932 and rs10494634, were found. However, these two variants are not associated with aggressive periodontitis. Secondly, we investigated the pattern of FAM5C expression in aggressive periodontitis lesions and its possible correlations with inflammatory/immunological factors and pathogens commonly associated with periodontal diseases. FAM5C mRNA expression was significantly higher in diseased versus healthy sites, and was found to be correlated to the IL-1β, IL-17A, IL-4 and RANKL mRNA levels. No correlations were found between FAM5C levels and the presence and load of red complex periodontopathogens or Aggregatibacter actinomycetemcomitans. This study provides evidence that FAM5C contributes to aggressive periodontitis

    An integrated multi-omics approach identifies the landscape of interferon-α-mediated responses of human pancreatic beta cells

    Get PDF
    Interferon-α (IFNα), a type I interferon, is expressed in the islets of type 1 diabetic individuals, and its expression and signaling are regulated by T1D genetic risk variants and viral infections associated with T1D. We presently characterize human beta cell responses to IFNα by combining ATAC-seq, RNA-seq and proteomics assays. The initial response to IFNα is characterized by chromatin remodeling, followed by changes in transcriptional and translational regulation. IFNα induces changes in alternative splicing (AS) and first exon usage, increasing the diversity of transcripts expressed by the beta cells. This, combined with changes observed on protein modification/degradation, ER stress and MHC class I, may expand antigens presented by beta cells to the immune system. Beta cells also up-regulate the checkpoint proteins PDL1 and HLA-E that may exert a protective role against the autoimmune assault. Data mining of the present multi-omics analysis identifies two compound classes that antagonize IFNα effects on human beta cells.This article is freely available via Open Access. Click on the Publisher URL to access it via the publisher's site.P30 DK097512/DK/NIDDK NIH HHS/United States UC4 DK104166/DK/NIDDK NIH HHS/United States MR/P010695/1/MRC_/Medical Research Council/United Kingdompublished version, accepted version, submitted versio

    Genome-wide analysis identifies 12 loci influencing human reproductive behavior.

    Get PDF
    The genetic architecture of human reproductive behavior-age at first birth (AFB) and number of children ever born (NEB)-has a strong relationship with fitness, human development, infertility and risk of neuropsychiatric disorders. However, very few genetic loci have been identified, and the underlying mechanisms of AFB and NEB are poorly understood. We report a large genome-wide association study of both sexes including 251,151 individuals for AFB and 343,072 individuals for NEB. We identified 12 independent loci that are significantly associated with AFB and/or NEB in a SNP-based genome-wide association study and 4 additional loci associated in a gene-based effort. These loci harbor genes that are likely to have a role, either directly or by affecting non-local gene expression, in human reproduction and infertility, thereby increasing understanding of these complex traits

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent
    corecore