1,638 research outputs found

    A proteogenomic update to Yersinia: enhancing genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins.</p> <p>Results</p> <p>The application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, <it>Yersinia pestis KIM</it>. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other <it>Yersinia </it>genomes, correcting and enhancing their annotations.</p> <p>Conclusions</p> <p>In total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e.g. blast) or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.</p

    Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

    Get PDF
    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

    The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

    Get PDF
    Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts

    The backbone structure of the thermophilic Thermoanaerobacter tengcongensis ribose binding protein is essentially identical to its mesophilic E. coli homolog

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparison of experimentally determined mesophilic and thermophilic homologous protein structures is an important tool for understanding the mechanisms that contribute to thermal stability. Of particular interest are pairs of homologous structures that are structurally very similar, but differ significantly in thermal stability.</p> <p>Results</p> <p>We report the X-ray crystal structure of a <it>Thermoanaerobacter tengcongensis </it>ribose binding protein (tteRBP) determined to 1.9 Å resolution. We find that tteRBP is significantly more stable (<sup><it>app</it></sup><it>T</it><sub><it>m </it></sub>value ~102°C) than the mesophilic <it>Escherichia coli </it>ribose binding protein (ecRBP) (<sup><it>app</it></sup><it>T</it><sub><it>m </it></sub>value ~56°C). The tteRBP has essentially the identical backbone conformation (0.41 Å RMSD of 235/271 C<sub>α </sub>positions and 0.65 Å RMSD of 270/271 C<sub>α </sub>positions) as ecRBP. Classification of the amino acid substitutions as a function of structure therefore allows the identification of amino acids which potentially contribute to the observed thermal stability of tteRBP in the absence of large structural heterogeneities.</p> <p>Conclusion</p> <p>The near identity of backbone structures of this pair of proteins entails that the significant differences in their thermal stabilities are encoded exclusively by the identity of the amino acid side-chains. Furthermore, the degree of sequence divergence is strongly correlated with structure; with a high degree of conservation in the core progressing to increased diversity in the boundary and surface regions. Different factors that may possibly contribute to thermal stability appear to be differentially encoded in each of these regions of the protein. The tteRBP/ecRBP pair therefore offers an opportunity to dissect contributions to thermal stability by side-chains alone in the absence of large structural differences.</p

    Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust.</p> <p>Results</p> <p>We apply our proposed iterative algorithm to three sets of experimental DNA microarray data from experiments with the yeast <it>Saccharomyces cerevisiae </it>and show that the proposed iterative approach improves biological coherence. Comparison with other clustering techniques suggests that our iterative algorithm provides superior performance with regard to biological coherence. An important consequence of our approach is that an increasing proportion of genes find membership in clusters of high biological coherence and that the average cluster specificity improves.</p> <p>Conclusion</p> <p>The results from these clustering experiments provide a robust basis for extracting motifs and trans-acting factors that determine particular patterns of expression. In addition, the biological coherence of the clusters is iteratively assessed independently of the clustering. Thus, this method will not be severely impacted by functional annotations that are missing, inaccurate, or sparse.</p

    Bee‑safe peptidomimetic acaricides achieved by comparative genomics

    Get PDF
    The devastating Varroa mite (Varroa destructor Anderson and Trueman) is an obligatory ectoparasite of the honey bee, contributing to significant colony losses in North America and throughout the world. The limited number of conventional acaricides to reduce Varroa mites and prevent disease in honey bee colonies is challenged with wide-spread resistance and low target-site selectivity. Here, we propose a biorational approach using comparative genomics for the development of honey beesafe and selective acaricides targeting the Varroa mite-specific neuropeptidergic system regulated by proctolin, which is lacking in the honey bee. Proctolin is a highly conserved pentapeptide RYLPT (Arg-Tyr-Leu-Pro-Thr) known to act through a G protein-coupled receptor to elicit myotropic activity in arthropod species. A total of 33 different peptidomimetic and peptide variants were tested on the Varroa mite proctolin receptor. Ligand docking model and mutagenesis studies revealed the importance of the core aromatic residue Tyr2 in the proctolin ligand. Peptidomimetics were observed to have significant oral toxicity leading to the paralysis and death of Varroa mites, while there were no negative effects observed for honey bees. We have demonstrated that a taxon-specific physiological target identified by advanced genomics information offers an opportunity to develop Varroa miteselective acaricides, hence, expedited translational processes

    Comparative Genomics of Microbial Chemoreceptor Sequence, Structure, and Function

    Get PDF
    Microbial chemotaxis receptors (chemoreceptors) are complex proteins that sense the external environment and signal for flagella-mediated motility, serving as the GPS of the cell. In order to sense a myriad of physicochemical signals and adapt to diverse environmental niches, sensory regions of chemoreceptors are frenetically duplicated, mutated, or lost. Conversely, the chemoreceptor signaling region is a highly conserved protein domain. Extreme conservation of this domain is necessary because it determines very specific helical secondary, tertiary, and quaternary structures of the protein while simultaneously choreographing a network of interactions with the adaptor protein CheW and the histidine kinase CheA. This dichotomous nature has split the chemoreceptor community into two major camps, studying either an organism’s sensory capabilities and physiology or the molecular signal transduction mechanism. Fortunately, the current vast wealth of sequencing data has enabled comparative study of chemoreceptors. Comparative genomics can serve as a bridge between these communities, connecting sequence, structure, and function through comprehensive studies on scales ranging from minute and molecular to global and ecological. Herein are four works in which comparative genomics illuminates unanswered questions across the broad chemoreceptor landscape. First, we used evolutionary histories to refine chemoreceptor interactions in Thermotoga maritima, pairing phylogenetics with x-ray crystallography. Next, we uncovered the origin of a unique chemoreceptor, isolated only from hypervirulent strains of Campylobacter jejuni, by comparing chemoreceptor signaling and sensory regions from Campylobacter and Helicobacter. We then selected the opportunistic human pathogen Pseudomonas aeruginosa to address the question of assigning multiple chemoreceptors to multiple chemotaxis pathways within the same organism. We assigned all P. aeruginosa receptors to pathways using a novel in silico approach by incorporating sequence information spanning the entire taxonomic order Pseudomonadales and beyond. Finally, we surveyed the chemotaxis systems of all environmental, commensal, laboratory, and pathogenic strains of the ubiquitous Escherichia coli, where we discovered an ancestral chemoreceptor gene loss event that may have predisposed a well-studied subpopulation to adopt extra-intestinal pathogenic lifestyles. Overall, comparative genomics is a cutting edge method for comprehensive chemoreceptor study that is poised to promote synergy within and expand the significance of the chemoreceptor field

    Systems Biology Knowledgebase for a New Era in Biology A Genomics:GTL Report from the May 2008 Workshop

    Full text link

    From gene to function: using new technologies for solving old problems.

    Get PDF
    Recent advances in DNA sequencing have changed the field of genomics as well as that of proteomics making it possible to generate gigabases of genome and transcriptome sequence data at substantially lower cost than it was possible just ten years ago. In recent years, many high-throughput technologies have been developed to interrogate various aspects of cellular processes, including sequence and structural variation and the transcriptome, epigenome, proteome and interactome. These Next Generation Sequencing (NGS) experimental technologies are more mature and accessible than the computational tools available for individual researchers to move, store, analyse and present data in a user-friendly and reproducible fashion. My research work is placed in this scenario and focuses on the analysis of data produced by NGS technologies as well as on the development of new tools aimed at solving the different problems that arise during NGS data analysis. In order to achieve this aim, my group and I have dealt with several open biomedical problems in collaboration with different research groups of the Sapienza University. Some of these experiments have already given interesting results but mostly have represented the occasion and starting point for the development of new tools able to improve some crucial steps of the analyses, solve problems derived by the system complexity and make the results easier to understand for the researchers. Some examples are IsomirT, a tool for the small RNA-Seq analysis and isomiR identification, Phagotto, a tool for analysing deep sequencing data derived from phage-displayed libraries and FIDEA, a web server for the functional interpretation of differential expression analysis. Recent reports have demonstrated that individual microRNAs can be heterogeneous in length and/or sequence producing multiple mature variants that have been dubbed isomiRs. IsomirT is a useful tool to improve and simplify the search for isomiRs starting directly from the results of a miRNA-sequencing experiment. By using it, we observed the behaviour of isomiRs in different cell types and in different biological replicates. Our results indicate that the distribution of the microRNA variants is similar among replicates and different among cells/tissues suggesting that the isomiRs have a functional role in the cell. The use of the NGS technologies for the analysis of antibody selected sequences both using phage display libraries and in vitro selection processes is becoming increasingly popular. By using these technologies, the experimental group headed by prof. Felici has introduced a new experimental pipeline, named PROFILER, aimed at significantly empowering the analysis of antigen-specific libraries. A key step to exploit this idea has been to develop a new tool, Phagotto, for processing and analysing the data derived by sequencing. PROFILER, in combination with Phagotto, seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. The publicly available web server FIDEA allows experimentalists to obtain a functional interpretation of the results derived from differential expression analysis and to test their hypothesis quickly and easily. The tool performs an enrichment analysis i.e. an analysis of specific properties that are distributed in a non random fashion in the up-regulated and down-regulated genes, taken both together and separately. It has been shown to be very useful and is being heavily used from scientists all over the world, more than 1500 requests for analysis have been submitted to the server in six months. Furthermore, during the course of the PhD I implemented pipelines for the speeding up and optimization of protocols for NGS data analysis and applied them to biomedical projects. Of course not all the proteins have a complete functional annotation and consequently the issue of predicting the function of proteins with a partial or no functional annotation arises. This can be done both by exploiting the 3D structure of the protein or by inferring the function directly from the sequence. A real challenge, however, is the assessment of the accuracy of existing methods. In this context the help that critical assessment experiments can give is essential. We have had the possibility to be involved, as assessors, in the world wide experiment CASP (Critical Assessment of protein Structure Prediction). In particular, we are involved in the assessment of the residue-residue contacts in which the participant groups provide a list of predicted contacts between residues that hopefully can be used as constraints to fold the protein. We proposed and implemented new methodologies to understand which method works better and where future efforts should be focused

    From Mollusks to Medicine: A Venomics Approach for the Discovery and Characterization of Therapeutics from Terebridae Peptide Toxins

    Full text link
    Animal venoms comprise a diversity of peptide toxins that manipulate molecular targets such as ion channels and receptors, making venom peptides attractive candidates for the development of therapeutics to benefit human health. However, identifying bioactive venom peptides remains a significant challenge. In this review we describe our particular venomics strategy for the discovery, characterization, and optimization of Terebridae venom peptides, teretoxins. Our strategy reflects the scientific path from mollusks to medicine in an integrative sequential approach with the following steps: (1) delimitation of venomous Terebridae lineages through taxonomic and phylogenetic analyses; (2) identification and classification of putative teretoxins through omics methodologies, including genomics, transcriptomics, and proteomics; (3) chemical and recombinant synthesis of promising peptide toxins; (4) structural characterization through experimental and computational methods; (5) determination of teretoxin bioactivity and molecular function through biological assays and computational modeling; (6) optimization of peptide toxin affinity and selectivity to molecular target; and (7) development of strategies for effective delivery of venom peptide therapeutics. While our research focuses on terebrids, the venomics approach outlined here can be applied to the discovery and characterization of peptide toxins from any venomous taxa
    corecore