26 research outputs found

    Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach

    Get PDF
    Background: Molecular biology is currently facing the challenging task of functionally characterizing the proteome. The large number of possible protein-protein interactions and complexes, the variety of environmental conditions and cellular states in which these interactions can be reorganized, and the multiple ways in which a protein can influence the function of others, requires the development of experimental and computational approaches to analyze and predict functional associations between proteins as part of their activity in the interactome. Methodology/Principal Findings: We have studied the possibility of constructing a classifier in order to combine the output of the several protein interaction prediction methods. The AODE (Averaged One-Dependence Estimators) machine learning algorithm is a suitable choice in this case and it provides better results than the individual prediction methods, and it has better performances than other tested alternative methods in this experimental set up. To illustrate the potential use of this new AODE-based Predictor of Protein InterActions (APPIA), when analyzing high-throughput experimental data, we show how it helps to filter the results of published High-Throughput proteomic studies, ranking in a significant way functionally related pairs. Availability: All the predictions of the individual methods and of the combined APPIA predictor, together with the used datasets of functional associations are available at http://ecid.bioinfo.cnio.es/. Conclusions: We propose a strategy that integrates the main current computational techniques used to predict functional associations into a unified classifier system, specifically focusing on the evaluation of poorly characterized protein pairs. We selected the AODE classifier as the appropriate tool to perform this task. AODE is particularly useful to extract valuable information from large unbalanced and heterogeneous data sets. The combination of the information provided by five prediction interaction prediction methods with some simple sequence features in APPIA is useful in establishing reliability values and helpful to prioritize functional interactions that can be further experimentally characterized.This work was funded by the BioSapiens (grant number LSHG-CT-2003-503265) and the Experimental Network for Functional Integration (ENFIN) Networks of Excellence (contract number LSHG-CT-2005-518254), by Consolider BSC (grant number CSD2007-00050) and by the project “Functions for gene sets” from the Spanish Ministry of Education and Science (BIO2007-66855). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Low-complexity regions within protein sequences have position-dependent roles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.</p> <p>Results</p> <p>In keeping with previous results, we found that LCR-containing proteins tend to have more binding partners across different PPI networks than proteins that have no LCRs. More specifically, our study suggests i) that LCRs are preferentially positioned towards the protein sequence extremities and, in contrast with centrally-located LCRs, such terminal LCRs show a correlation between their lengths and degrees of connectivity, and ii) that centrally-located LCRs are enriched with transcription-related GO terms, while terminal LCRs are enriched with translation and stress response-related terms.</p> <p>Conclusions</p> <p>Our results suggest not only that LCRs may be involved in flexible binding associated with specific functions, but also that their positions within a sequence may be important in determining both their binding properties and their biological roles.</p

    Integrating Computational Biology and Forward Genetics in Drosophila

    Get PDF
    Genetic screens are powerful methods for the discovery of gene–phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of “omics” data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene–gene association discovery

    An Amphioxus Gli Gene Reveals Conservation of Midline Patterning and the Evolution of Hedgehog Signalling Diversity in Chordates

    Get PDF
    Background. Hedgehog signalling, interpreted in receiving cells by Gli transcription factors, plays a central role in the development of vertebrate and Drosphila embryos. Many aspects of the signalling pathway are conserved between these lineages, however vertebrates have diverged in at least one key aspect: they have evolved multiple Gli genes encoding functionally-distinct proteins, increasing the complexity of the hedgehog-dependent transcriptional response. Amphioxus is one of the closest living relatives of the vertebrates, having split from the vertebrate lineage prior to the widespread gene duplication prominent in early vertebrate evolution. Principal findings. We show that amphioxus has a single Gli gene, which is deployed in tissues adjacent to sources of hedgehog signalling derived from the midline and anterior endoderm. This shows the duplication and divergence of the Gli family, and hence the origin of vertebrate Gli functional diversity, was specific to the vertebrate lineage. However we also show that the single amphioxus Gli gene produces two distinct transcripts encoding different proteins. We utilise three tests of Gli function to examine the transcription regulatory capacities of these different proteins, demonstrating one has activating activity similar to Gli2, while the other acts as a weak repressor, similar to Gli3. Conclusions. These data show that the vertebrates and amphioxus have evolved functionally-similar repertoires of Gli proteins using parallel molecular routes; vertebrates via gene duplication and divergence, and amphioxus via alternate splicing of a single gene. Our results demonstrate that similar functional complexity of intercellular signalling can be achieved via different evolutionary pathways

    Proteomics Characterization of Cytoplasmic and Lipid-Associated Membrane Proteins of Human Pathogen Mycoplasma fermentans M64

    Get PDF
    Mycoplasma fermentans is a potent human pathogen which has been implicated in several diseases. Notably, its lipid-associated membrane proteins (LAMPs) play a role in immunomodulation and development of infection-associated inflammatory diseases. However, the systematic protein identification of pathogenic M. fermentans has not been reported. From our recent sequencing results of M. fermentans M64 isolated from human respiratory tract, its genome is around 1.1 Mb and encodes 1050 predicted protein-coding genes. In the present study, soluble proteome of M. fermentans was resolved and analyzed using two-dimensional gel electrophoresis. In addition, Triton X-114 extraction was carried out to enrich amphiphilic proteins including putative lipoproteins and membrane proteins. Subsequent mass spectrometric analyses of these proteins had identified a total of 181 M. fermentans ORFs. Further bioinformatics analysis of these ORFs encoding proteins with known or so far unknown orthologues among bacteria revealed that a total of 131 proteins are homologous to known proteins, 11 proteins are conserved hypothetical proteins, and the remaining 39 proteins are likely M. fermentans-specific proteins. Moreover, Triton X-114-enriched fraction was shown to activate NF-kB activity of raw264.7 macrophage and a total of 21 lipoproteins with predicted signal peptide were identified therefrom. Together, our work provides the first proteome reference map of M. fermentans as well as several putative virulence-associated proteins as diagnostic markers or vaccine candidates for further functional study of this human pathogen

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe
    corecore