411 research outputs found

    AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

    Get PDF
    We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

    Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools

    Get PDF
    The immensely popular fields of cancer research and bioinformatics overlap in many different areas, e.g. large data repositories that allow for users to analyze data from many experiments (data handling, databases), pattern mining, microarray data analysis, and interpretation of proteomics data. There are many newly available resources in these areas that may be unfamiliar to most cancer researchers wanting to incorporate bioinformatics tools and analyses into their work, and also to bioinformaticians looking for real data to develop and test algorithms. This review reveals the interdependence of cancer research and bioinformatics, and highlight the most appropriate and useful resources available to cancer researchers. These include not only public databases, but general and specific bioinformatics tools which can be useful to the cancer researcher. The primary foci are function and structure prediction tools of protein genes. The result is a useful reference to cancer researchers and bioinformaticians studying cancer alike

    Structural approaches to protein sequence analysis

    Get PDF
    Various protein sequence analysis techniques are described, aimed at improving the prediction of protein structure by means of pattern matching. To investigate the possibility that improvements in amino acid comparison matrices could result in improvements in the sensitivity and accuracy of protein sequence alignments, a method for rapidly calculating amino acid mutation data matrices from large sequence data sets is presented. The method is then applied to the membrane-spanning segments of integral membrane proteins in order to investigate the nature of amino acid mutability in a lipid environment. Whilst purely sequence analytic techniques work well for cases where some residual sequence similarity remains between a newly characterized protein and a protein of known 3-D structure, in the harder cases, there is little or no sequence similarity with which to recognize proteins with similar folding patterns. In the light of these limitations, a new approach to protein fold recognition is described, which uses a statistically derived pairwise potential to evaluate the compatibility between a test sequence and a library of structural templates, derived from solved crystal structures. The method, which is called optimal sequence threading, proves to be highly successful, and is able to detect the common TIM barrel fold between a number of enzyme sequences, which has not been achieved by any previous sequence analysis technique. Finally, a new method for the prediction of the secondary structure and topology of membrane proteins is described. The method employs a set of statistical tables compiled from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane

    Visualisation and graph-theoretic analysis of a large-scale protein structural interactome

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background Large-scale protein interaction maps provide a new, global perspective with which to analyse protein function. PSIMAP, the Protein Structural Interactome Map, is a database of all the structurally observed interactions between superfamilies of protein domains with known three-dimensional structure in the PDB. PSIMAP incorporates both functional and evolutionary information into a single network. Results We present a global analysis of PSIMAP using several distinct network measures relating to centrality, interactivity, fault-tolerance, and taxonomic diversity. We found the following results: Centrality: we show that the center and barycenter of PSIMAP do not coincide, and that the superfamilies forming the barycenter relate to very general functions, while those constituting the center relate to enzymatic activity. Interactivity: we identify the P-loop and immunoglobulin superfamilies as the most highly interactive. We successfully use connectivity and cluster index, which characterise the connectivity of a superfamily's neighbourhood, to discover superfamilies of complex I and II. This is particularly significant as the structure of complex I is not yet solved. Taxonomic diversity: we found that highly interactive superfamilies are in general taxonomically very diverse and are thus amongst the oldest. Fault-tolerance: we found that the network is very robust as for the majority of superfamilies removal from the network will not break up the network. Conclusions Overall, we can single out the P-loop containing nucleotide triphosphate hydrolases superfamily as it is the most highly connected and has the highest taxonomic diversity. In addition, this superfamily has the highest interaction rank, is the barycenter of the network (it has the shortest average path to every other superfamily in the network), and is an articulation vertex, whose removal will disconnect the network. More generally, we conclude that the graph-theoretic and taxonomic analysis of PSIMAP is an important step towards the understanding of protein function and could be an important tool for tracing the evolution of life at the molecular level.Published versio

    Automated structural annotation of the malaria proteome and identification of candidate proteins for modelling and crystallization studies

    Get PDF
    Malaria is the cause of over one million deaths per year, primarily in African children. The parasite responsible for the most virulent form of malaria, is Plasmodium falciparum. Protein structure plays a pivotal role in elucidating mechanisms of parasite functioning and resistance to anti-malarial drugs. Protein structure furthermore aids the determination of protein function, which can together with the structure be used to identify novel drug targets in the parasite. However, various structural features in P. falciparum proteins complicate the experimental determination of protein three dimensional structures. Furthermore, the presence of parasite-specific inserts results in reduced similarity of these proteins to orthologous proteins with experimentally determined structures. The lack of solved structures in the malaria parasite, together with limited similarities to proteins in the Protein Data Bank, necessitate genome-scale structural annotation of P. falciparum proteins. Additionally, the annotation of a range of structural features facilitates the identification of suitable targets for structural studies. An integrated structural annotation system was constructed and applied to all the predicted proteins in P. falciparum, Plasmodium vivax and Plasmodium yoelii. Similarity searches against the PDB, Pfam, Superfamily, PROSITE and PRINTS were included. In addition, the following predictions were made for the P. falciparum proteins: secondary structure, transmembrane helices, protein disorder, low complexity, coiled-coils and small molecule interactions. P. falciparum protein-protein interactions and proteins exported to the RBC were annotated from literature. Finally, a selection of proteins were threaded through a library of SCOP folds. All the results are stored in a relational PostgreSQL database and can be viewed through a web interface (http://deepthought.bi.up.ac.za:8080/Annotation). In order to select groups of proteins which fulfill certain criteria with regard to structural and functional features, a query tool was constructed. Using this tool, criteria regarding the presence or absence of all the predicted features can be specified. Analysis of the results obtained revealed that P. falciparum protein-interacting proteins contain a higher percentage of predicted disordered residues than non-interacting proteins. Proteins interacting with 10 or more proteins have a disordered content concentrated in the range of 60-100%, while the disorder distribution for proteins having only one interacting partner, was more evenly spread. Comparisons of structural and sequence features between the three species, revealed that P. falciparum proteins tend to be longer and vary more in length than the other two species. P. falciparum proteins also contained more predicted low complexity and disorder content than proteins from P. yoelii and P. vivax. P. falciparumprotein targets for experimental structure determination, comparative modeling and in silico docking studies were putatively identified based on structural features. For experimental structure determination, 178 targets were identi_ed. These targets contain limited contents of predicted transmembrane helix, disorder, coiled-coils, low complexity and signal peptide, as these features may complicate steps in the experimental structure determination procedure. In addition, the targets display low similarity to proteins in the PDB. Comparisons of the targets to proteins with crystal structures, revealed that the structures and predicted targets had similar sequence properties and predicted structural features. A group of 373 proteins which displayed high levels of similarity to proteins in the PDB, were identified as targets for comparitive modeling studies. Finally, 197 targets for in silico docking were identified based on predicted small molecule interactions and the availability of a 3D structure.Dissertation (MSc)--University of Pretoria, 2008.Biochemistryunrestricte

    Applications of Evolutionary Bioinformatics in Basic and Biomedical Research

    Get PDF
    With the revolutionary progress in sequencing technologies, computational biology emerged as a game-changing field which is applied in understanding molecular events of life for not only complementary but also exploratory purposes. Bioinformatics resources and tools significantly help in data generation, organization and analysis. However, there is still a need for developing new approaches built based on a biologistā€™s point of view. In protein bioinformatics, there are several fundamental problems such as (i) determining protein function; (ii) identifying protein-protein interactions; (iii) predicting the effect of amino acid variants. Here, I present three chapters addressing these problems from an evolutionary perspective. Firstly, I describe a novel search pipeline for protein domain identification. The algorithm chain provides sensitive domain assignments with the highest possible specificity. Secondly, I present a tool enabling large-scale visualization of presences and absences of proteins in hierarchically clustered genomes. This tool visualizes multi-layer information of any kind of genome-linked data with a special focus on domain architectures, enabling identification of coevolving domains/proteins, which can eventually help in identifying functionally interacting proteins. And finally, I propose an approach for distinguishing between benign and damaging missense mutations in a human disease by establishing the precise evolutionary history of the associated gene. This part introduces new criteria on how to determine functional orthologs via phylogenetic analysis. All three parts use comparative genomics and/or sequence analyses. Taken together, this study addresses important problems in protein bioinformatics and as a whole it can be utilized to describe proteins by their domains, coevolving partners and functionally important residues
    • ā€¦
    corecore