29 research outputs found

    Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

    Get PDF
    MOTIVATION: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. RESULTS: PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≀0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. AVAILABILITY AND IMPLEMENTATION: https://github.com/ima23/maxent-ppi. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    In vivo analysis of proteomes and interactomes using Parallel Affinity Capture (iPAC) coupled to mass spectrometry.

    Get PDF
    Affinity purification coupled to mass spectrometry provides a reliable method for identifying proteins and their binding partners. In this study we have used Drosophila melanogaster proteins triple tagged with Flag, Strep II, and Yellow fluorescent protein in vivo within affinity pull-down experiments and isolated these proteins in their native complexes from embryos. We describe a pipeline for determining interactomes by Parallel Affinity Capture (iPAC) and show its use by identifying partners of several protein baits with a range of sizes and subcellular locations. This purification protocol employs the different tags in parallel and involves detailed comparison of resulting mass spectrometry data sets, ensuring the interaction lists achieved are of high confidence. We show that this approach identifies known interactors of bait proteins as well as novel interaction partners by comparing data achieved with published interaction data sets. The high confidence in vivo protein data sets presented here add new data to the currently incomplete D. melanogaster interactome. Additionally we report contaminant proteins that are persistent with affinity purifications irrespective of the tagged bait.This project is funded by the Welcome Trust.This is the final version of the article. It was first available from ASBMB via http://dx.doi.org/10.1074/mcp.M110.00238

    Analysis of the expression patterns, subcellular localisations and interaction partners of Drosophila proteins using a pigP protein trap library.

    Get PDF
    Although we now have a wealth of information on the transcription patterns of all the genes in the Drosophila genome, much less is known about the properties of the encoded proteins. To provide information on the expression patterns and subcellular localisations of many proteins in parallel, we have performed a large-scale protein trap screen using a hybrid piggyBac vector carrying an artificial exon encoding yellow fluorescent protein (YFP) and protein affinity tags. From screening 41 million embryos, we recovered 616 verified independent YFP-positive lines representing protein traps in 374 genes, two-thirds of which had not been tagged in previous P element protein trap screens. Over 20 different research groups then characterized the expression patterns of the tagged proteins in a variety of tissues and at several developmental stages. In parallel, we purified many of the tagged proteins from embryos using the affinity tags and identified co-purifying proteins by mass spectrometry. The fly stocks are publicly available through the Kyoto Drosophila Genetics Resource Center. All our data are available via an open access database (Flannotator), which provides comprehensive information on the expression patterns, subcellular localisations and in vivo interaction partners of the trapped proteins. Our resource substantially increases the number of available protein traps in Drosophila and identifies new markers for cellular organelles and structures.This work was supported by a project grant from the Wellcome Trust [076739], by a Wellcome Trust Principal Research Fellowship to D.StJ. [049818 and 080007], and by core support from the Wellcome Trust [092096] and Cancer Research UK [A14492].This is the final version of the article. It was first available from The Company of Biologists via http://dx.doi.org/10.1242/dev.11105

    The effect of LRRK2 loss-of-function variants in humans

    Get PDF
    Analysis of large genomic datasets, including gnomAD, reveals that partial LRRK2 loss of function is not strongly associated with diseases, serving as an example of how human genetics can be leveraged for target validation in drug discovery. Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes(1,2). Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease(3,4), suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns(5-8), the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)(9), 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work(10), confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.Peer reviewe

    The ELIXIR Human Copy Number Variations Community:building bioinformatics infrastructure for research

    Get PDF
    Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While 'High-Throughput' sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR's recently established h uman CNV Community, with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context

    Ensembl Genomes 2016: more genomes, more complexity

    Get PDF
    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces

    Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity

    Get PDF
    A major goal of biomedicine is to understand the function of every gene in the human genome. Loss-of-function mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such 'human knockouts' can provide insight into gene function. Consanguineous unions are more likely to result in offspring carrying homozygous loss-of-function mutations. In Pakistan, consanguinity rates are notably high. Here we sequence the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS), designed to understand the determinants of cardiometabolic diseases in individuals from South Asia. We identified individuals carrying homozygous predicted loss-of-function (pLoF) mutations, and performed phenotypic analysis involving more than 200 biochemical and disease traits. We enumerated 49,138 rare (<1% minor allele frequency) pLoF mutations. These pLoF mutations are estimated to knock out 1,317 genes, each in at least one participant. Homozygosity for pLoF mutations at PLA2G7 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signalling. Heterozygous deficiency of APOC3 has been shown to protect against coronary heart disease; we identified APOC3 homozygous pLoF carriers in our cohort. We recruited these human knockouts and challenged them with an oral fat load. Compared with family members lacking the mutation, individuals with APOC3 knocked out displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a 'human knockout project', a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans.D.S. is supported by grants from the National Institutes of Health, the Fogarty International, the Wellcome Trust, the British Heart Foundation, and Pfizer. P.N. is supported by the John S. LaDue Memorial Fellowship in Cardiology from Harvard Medical School. H.-H.W. is supported by a grant from the Samsung Medical Center, Korea (SMO116163). S.K. is supported by the Ofer and Shelly Nemirovsky MGH Research Scholar Award and by grants from the National Institutes of Health (R01HL107816), the Donovan Family Foundation, and Fondation Leducq. Exome sequencing was supported by a grant from the NHGRI (5U54HG003067-11) to S.G. and E.S.L. D.G.M. is supported by a grant from the National Institutes of Health (R01GM104371). J.D. holds a British Heart Foundation Chair, European Research Council Senior Investigator Award, and NIHR Senior Investigator Award. The Cardiovascular Epidemiology Unit at the University of Cambridge, which supported the field work and genotyping of PROMIS, is funded by the UK Medical Research Council, British Heart Foundation, and NIHR Cambridge Biomedical Research Centre ... Fieldwork in the PROMIS study has been supported through funds available to investigators at the Center for Non-Communicable Diseases, Pakistan and the University of Cambridge, UK

    The ELIXIR Human Copy Number Variations Community: building bioinformatics infrastructure for research

    Full text link
    Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While “High-Throughput” sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR’s recently established human CNV Community, with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context. Keywords Copy Number Variation, Data analysis, next-generation sequencing, whole genome sequencing, Human Genetics, Oncogenetics, Common Diseases, Federated Human Dat

    The genome of the biting midge Culicoides sonorensis and gene expression analyses of vector competence for bluetongue virus

    Get PDF
    BACKGROUND: The new genomic technologies have provided novel insights into the genetics of interactions between vectors, viruses and hosts, which are leading to advances in the control of arboviruses of medical importance. However, the development of tools and resources available for vectors of non-zoonotic arboviruses remains neglected. Biting midges of the genus Culicoides transmit some of the most important arboviruses of wildlife and livestock worldwide, with a global impact on economic productivity, health and welfare. The absence of a suitable reference genome has hindered genomic analyses to date in this important genus of vectors. In the present study, the genome of Culicoides sonorensis, a vector of bluetongue virus (BTV) in the USA, has been sequenced to provide the first reference genome for these vectors. In this study, we also report the use of the reference genome to perform initial transcriptomic analyses of vector competence for BTV. RESULTS: Our analyses reveal that the genome is 189 Mb, assembled in 7974 scaffolds. Its annotation using the transcriptomic data generated in this study and in a previous study has identified 15,612 genes. Gene expression analyses of C. sonorensis females infected with BTV performed in this study revealed 165 genes that were differentially expressed between vector competent and refractory females. Two candidate genes, glutathione S-transferase (gst) and the antiviral helicase ski2, previously recognized as involved in vector competence for BTV in C. sonorensis (gst) and repressing dsRNA virus propagation (ski2), were confirmed in this study. CONCLUSIONS: The reference genome of C. sonorensis has enabled preliminary analyses of the gene expression profiles of vector competent and refractory individuals. The genome and transcriptomes generated in this study provide suitable tools for future research on arbovirus transmission. These provide a valuable resource for these vector lineage, which diverged from other major Dipteran vector families over 200 million years ago. The genome will be a valuable source of comparative data for other important Dipteran vector families including mosquitoes (Culicidae) and sandflies (Psychodidae), and together with the transcriptomic data can yield potential targets for transgenic modification in vector control and functional studies

    Comparative evolutionary analyses of eight whitefly Bemisia tabaci sensu lato genomes: cryptic species, agricultural pests and plant-virus vectors

    No full text
    The genomes, transcriptomes, and predicted protein-coding sequences are available from Ensembl Metazoa (http://metazoa.ensembl.org) and are included within the references. Raw RNA-Seq datasets generated and/or analyzed during the current study are available from the European Nucleotide Archive database repository (https://www.ebi.ac.uk/ena) under the parent project accessions: PRJEB28507, PRJEB36965, PRJEB35304, PRJEB39408. All data generated during the analyses of these datasets are included in this published article, supplementary information files, and figshare repository (https://doi.org/10.6084/m9.figshare.23666799; https://doi.org/10.6084/m9.figshare.23666832.v4; https://doi.org/10.6084/m9.figshare.23666844).International audienceBackground: The group of > 40 cryptic whitefly species called Bemisia tabaci sensu lato are amongst the world's worst agricultural pests and plant-virus vectors. Outbreaks of B. tabaci s.l. and the associated plant-virus diseases continue to contribute to global food insecurity and social instability, particularly in sub-Saharan Africa and Asia. Published B. tabaci s.l. genomes have limited use for studying African cassava B. tabaci SSA1 species, due to the high genetic divergences between them. Genomic annotations presented here were performed using the 'Ensembl gene annotation system' , to ensure that comparative analyses and conclusions reflect biological differences, as opposed to arising from different methodologies underpinning transcript model identification. Results: We present here six new B. tabaci s.l. genomes from Africa and Asia, and two re-annotated previously published genomes, to provide evolutionary insights into these globally distributed pests. Genome sizes ranged between 616-658 Mb and exhibited some of the highest coverage of transposable elements reported within Arthropoda. Many fewer total protein coding genes (PCG) were recovered compared to the previously published B. tabaci s.l. genomes and structural annotations generated via the uniform methodology strongly supported a repertoire of between 12.8-13.2 × 10 3 PCG. An integrative systematics approach incorporating phylogenomic analysis of nuclear and mitochondrial markers supported a monophyletic Aleyrodidae and the basal positioning of B. tabaci Uganda-1 to the sub-Saharan group of species. Reciprocal cross-mating data and the co-cladogenesis pattern of the primary obligate endosymbiont 'Candidatus Portiera aleyrodidarum' from 11 Bemisia genomes further supported the phylogenetic reconstruction to show that African cassava B. tabaci populations consist of just three biological species. We include comparative analyses of gene families related to detoxification, sugar metabolism, vector competency and evaluate the presence and function of horizontally transferred genes, essential for understanding the evolution and unique biology of constituent B. tabaci. s.l species.Conclusions: These genomic resources have provided new and critical insights into the genetics underlying B. tabaci s.l. biology. They also provide a rich foundation for post-genomic research, including the selection of candidate gene-targets for innovative whitefly and virus-control strategies
    corecore