11,405 research outputs found

    High throughput profile-profile based fold recognition for the entire human proteome

    Get PDF
    BACKGROUND: In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS: We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION: This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE

    Predicting protein structures and structural annotation of proteomes

    Get PDF
    Protein structure prediction methods aim to predict the structures of proteins from their amino acid sequences, utilizing various computational algorithms. Structural genome annotation is the process of attaching biological information to every protein encoded within a genome via the production of three-dimensional protein models

    Template-based structure modeling of protein-protein interactions

    Get PDF
    The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the protein-protein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. © 2013

    Identification of Widespread Adenosine Nucleotide Binding in Mycobacterium tuberculosis

    Get PDF
    SummaryComputational prediction of protein function is frequently error-prone and incomplete. In Mycobacterium tuberculosis (Mtb), ∼25% of all genes have no predicted function and are annotated as hypothetical proteins, severely limiting our understanding of Mtb pathogenicity. Here, we utilize a high-throughput quantitative activity-based protein profiling (ABPP) platform to probe, annotate, and validate ATP-binding proteins in Mtb. We experimentally validate prior in silico predictions of >240 proteins and identify 72 hypothetical proteins as ATP binders. ATP interacts with proteins with diverse and unrelated sequences, providing an expanded view of adenosine nucleotide binding in Mtb. Several hypothetical ATP binders are essential or taxonomically limited, suggesting specialized functions in mycobacterial physiology and pathogenicity

    Identification of the Feline Humoral Immune Response to Bartonella henselae Infection by Protein Microarray

    Get PDF
    Background: Bartonella henselae is the zoonotic agent of cat scratch disease and causes potentially fatal infections in immunocompromised patients. Understanding the complex interactions between the host’s immune system and bacterial pathogens is central to the field of infectious diseases and to the development of effective diagnostics and vaccines. Methodology: We report the development of a microarray comprised of proteins expressed from 96 % (1433/1493) of the predicted ORFs encoded by the genome of the zoonotic pathogen Bartonella henselae. The array was probed with a collection of 62 uninfected, 62 infected, and 8 ‘‘specific-pathogen free’ ’ naïve cat sera, to profile the antibody repertoire elicited during natural Bartonella henselae infection. Conclusions: We found that 7.3 % of the B. henselae proteins on the microarray were seroreactive and that seroreactivity was not evenly distributed between predicted protein function or subcellular localization. Membrane proteins were significantly most likely to be seroreactive, although only 23 % of the membrane proteins were reactive. Conversely, we found that proteins involved in amino acid transport and metabolism were significantly underrepresented and did not contain any seroreactive antigens. Of all seroreactive antigens, 52 were differentially reactive with sera from infected cats, and 53 were equally reactive with sera from infected and uninfected cats. Thirteen of the seroreactive antigens were found to be differentially seroreactive between B. henselae type I and type II. Based on these results, we developed a classifier algorith

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    Mapping specificity, cleavage entropy, allosteric changes and substrates of blood proteases in a high-throughput screen

    Get PDF
    Proteases are among the largest protein families and critical regulators of biochemical processes like apoptosis and blood coagulation. Knowledge of proteases has been expanded by the development of proteomic approaches, however, technology for multiplexed screening of proteases within native environments is currently lacking behind. Here we introduce a simple method to profile protease activity based on isolation of protease products from native lysates using a 96FASP filter, their analysis in a mass spectrometer and a custom data analysis pipeline. The method is significantly faster, cheaper, technically less demanding, easy to multiplex and produces accurate protease fingerprints. Using the blood cascade proteases as a case study, we obtain protease substrate profiles that can be used to map specificity, cleavage entropy and allosteric effects and to design protease probes. The data further show that protease substrate predictions enable the selection of potential physiological substrates for targeted validation in biochemical assays

    MicroRNAs from saliva of anopheline mosquitoes mimic human endogenous miRNAs and may contribute to vector-host-pathogen interactions

    Get PDF
    During blood feeding haematophagous arthropods inject into their hosts a cocktail of salivary proteins whose main role is to counteract host haemostasis, inflammation and immunity. However, animal body fluids are known to also carry miRNAs. To get insights into saliva and salivary gland miRNA repertoires of the African malaria vector Anopheles coluzzii we used small RNA-Seq and identified 214 miRNAs, including tissue-enriched, sex-biased and putative novel anopheline miRNAs. Noteworthy, miRNAs were asymmetrically distributed between saliva and salivary glands, suggesting that selected miRNAs may be preferentially directed toward mosquito saliva. The evolutionary conservation of a subset of saliva miRNAs in Anopheles and Aedes mosquitoes, and in the tick Ixodes ricinus, supports the idea of a non-random occurrence pointing to their possible physiological role in blood feeding by arthropods. Strikingly, eleven of the most abundant An. coluzzi saliva miRNAs mimicked human miRNAs. Prediction analysis and search for experimentally validated targets indicated that miRNAs from An. coluzzii saliva may act on host mRNAs involved in immune and inflammatory responses. Overall, this study raises the intriguing hypothesis that miRNAs injected into vertebrates with vector saliva may contribute to host manipulation with possible implication for vector-host interaction and pathogen transmission

    HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

    Get PDF
    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de
    • …
    corecore