441 research outputs found

    Predicting active site residue annotations in the Pfam database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family.</p> <p>Description</p> <p>We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and <it>MEROPS </it>we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives.</p> <p>Conclusion</p> <p>We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.</p

    PAAR-repeat proteins sharpen and diversify the Type VI secretion system spike

    Get PDF
    The bacterial type VI secretion system (T6SS) is a large multi-component, dynamic macromolecular machine that plays an important role in the ecology of many Gram negative bacteria. T6SS is responsible for translocation of a wide range of toxic effector molecules allowing predatory cells to kill both prokaryotic as well as eukaryotic prey cells1-5. The T6SS organelle is functionally analogous to contractile tails of bacteriophages and is thought to attack cells by initially penetrating them with a trimeric protein complex called the VgrG spike6,7. Neither the exact protein composition of the T6SS organelle nor the mechanisms of effector selection and delivery are known. Here we report that proteins from the PAAR (Proline-Alanine-Alanine-aRginine) repeat superfamily form a sharp conical extension on the VgrG spike, which is further involved in attaching effector domains to the spike. The crystal structures of two PAAR-repeat proteins bound to VgrG-like partners show that these proteins function to sharpen the tip of the VgrG spike. We demonstrate that PAAR proteins are essential for T6SS- mediated secretion and target cell killing by Vibrio cholerae and Acinetobacter baylyi. Our results suggest a new model of the T6SS organelle in which the VgrG-PAAR spike complex is decorated with multiple effectors that are delivered simultaneously into target cells in a single contraction-driven translocation event

    Potential conservation of circadian clock proteins in the phylum Nematoda as revealed by bioinformatic searches

    Get PDF
    Although several circadian rhythms have been described in C. elegans, its molecular clock remains elusive. In this work we employed a novel bioinformatic approach, applying probabilistic methodologies, to search for circadian clock proteins of several of the best studied circadian model organisms of different taxa (Mus musculus, Drosophila melanogaster, Neurospora crassa, Arabidopsis thaliana and Synechoccocus elongatus) in the proteomes of C. elegans and other members of the phylum Nematoda. With this approach we found that the Nematoda contain proteins most related to the core and accessory proteins of the insect and mammalian clocks, which provide new insights into the nematode clock and the evolution of the circadian system.Fil: Romanowski, Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Cronobiología; ArgentinaFil: Garavaglia, Matías Javier. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Ing.genética y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Goya, María Eugenia. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Cronobiología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Ghiringhelli, Pablo Daniel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Ing.genética y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Golombek, Diego Andres. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Cronobiología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

    PhenoFam-gene set enrichment analysis through protein structural information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam.</p> <p>Results</p> <p>PhenoFam performs gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Our tool is designed to analyse complete sets of results from quantitative high-throughput studies (gene expression microarrays, functional RNAi screens, <it>etc</it>.) without prior pre-filtering or hits-selection steps. PhenoFam utilizes Ensembl databases to link a list of user-provided identifiers with protein features from the InterPro database, and assesses whether results associated with individual domains differ significantly from the overall population. To demonstrate the utility of PhenoFam we analysed a genome-wide RNA interference screen and discovered a novel function of plexins containing the cytoplasmic RasGAP domain. Furthermore, a PhenoFam analysis of breast cancer gene expression profiles revealed a link between breast carcinoma and altered expression of PX domain containing proteins.</p> <p>Conclusions</p> <p>PhenoFam provides a user-friendly, easily accessible web interface to perform GSEA based on high-throughput data sets and structural-functional protein information, and therefore aids in functional annotation of genes.</p

    TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool.

    Get PDF
    BACKGROUND: Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence. RESULTS: Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one. CONCLUSION: These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses. AVAILABILITY: The program is available upon request for academic use

    GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics

    Get PDF
    A large number of sensitive homology searches are required for mapping DNA sequence fragments to known protein sequences in public and private databases during metagenomic analysis. BLAST is currently used for this purpose, but its calculation speed is insufficient, especially for analyzing the large quantities of sequence data obtained from a next-generation sequencer. However, faster search tools, such as BLAT, do not have sufficient search sensitivity for metagenomic analysis. Thus, a sensitive and efficient homology search tool is in high demand for this type of analysis.We developed a new, highly efficient homology search algorithm suitable for graphics processing unit (GPU) calculations that was implemented as a GPU system that we called GHOSTM. The system first searches for candidate alignment positions for a sequence from the database using pre-calculated indexes and then calculates local alignments around the candidate positions before calculating alignment scores. We implemented both of these processes on GPUs. The system achieved calculation speeds that were 130 and 407 times faster than BLAST with 1 GPU and 4 GPUs, respectively. The system also showed higher search sensitivity and had a calculation speed that was 4 and 15 times faster than BLAT with 1 GPU and 4 GPUs.We developed a GPU-optimized algorithm to perform sensitive sequence homology searches and implemented the system as GHOSTM. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We developed GHOSTM, which is a cost-efficient tool, and offer this tool as a potential solution to this problem

    The InterPro protein families and domains database: 20 years on

    Get PDF
    The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan

    Prediction of potential drug targets based on simple sequence properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets.</p> <p>Results</p> <p>Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research.</p> <p>Conclusion</p> <p>We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.</p

    Identification of a novel Leucine-rich repeat protein and candidate PP1 regulatory subunit expressed in developing spermatids

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Spermatogenesis is comprised of a series of highly regulated developmental changes that transform the precursor germ cell into a highly specialized spermatozoon. The last phase of spermatogenesis, termed spermiogenesis, involves dramatic morphological change including formation of the acrosome, elongation and condensation of the nucleus, formation of the flagella, and disposal of unnecessary cytoplasm. A prominent cytoskeletal component of the developing spermatid is the manchette, a unique microtubular structure that surrounds the nucleus of the developing spermatid and is thought to assist in both the reshaping of the nucleus and redistribution of spermatid cytoplasm. Although the molecular motor KIFC1 has been shown to associate with the manchette, its precise role in function of the manchette and the identity of its testis specific protein partners are unknown. The purpose of this study was to identify proteins in the testis that interact with KIFC1 using a yeast 2 hybrid screen of a testis cDNA library.</p> <p>Results</p> <p>Thirty percent of the interacting clones identified in our screen contain an identical cDNA encoding a 40 kD protein. This interacting protein has 4 leucine-rich repeats in its amino terminal half and is expressed primarily in the testis; therefore we have named this protein testis leucine-rich repeat protein or TLRR. TLRR was also found to associate tightly with the KIFC1 targeting domain using affinity chromatography. In addition to the leucine-rich repeats, TLRR contains a consensus-binding site for protein phosphatase-1 (PP1). Immunocytochemistry using a TLRR specific antibody demonstrates that this protein is found near the manchette of developing spermatids.</p> <p>Conclusion</p> <p>We have identified a previously uncharacterized leucine-rich repeat protein that is expressed abundantly in the testis and associates with the manchette of developing spermatids, possibly through its interaction with the KIFC1 molecular motor. TLRR is homologous to a class of regulatory subunits for PP1, a central phosphatase in the reversible phosphorylation of proteins that is key to modulation of many intracellular processes. TLRR may serve to target this important signaling molecule near the nucleus of developing spermatids in order to control the cellular rearrangements of spermiogenesis.</p

    Elevated Uptake of Plasma Macromolecules by Regions of Arterial Wall Predisposed to Plaque Instability in a Mouse Model

    Get PDF
    Atherosclerosis may be triggered by an elevated net transport of lipid-carrying macromolecules from plasma into the arterial wall. We hypothesised that whether lesions are of the thin-cap fibroatheroma (TCFA) type or are less fatty and more fibrous depends on the degree of elevation of transport, with greater uptake leading to the former. We further hypothesised that the degree of elevation can depend on haemodynamic wall shear stress characteristics and nitric oxide synthesis. Placing a tapered cuff around the carotid artery of apolipoprotein E -/- mice modifies patterns of shear stress and eNOS expression, and triggers lesion development at the upstream and downstream cuff margins; upstream but not downstream lesions resemble the TCFA. We measured wall uptake of a macromolecular tracer in the carotid artery of C57bl/6 mice after cuff placement. Uptake was elevated in the regions that develop lesions in hyperlipidaemic mice and was significantly more elevated where plaques of the TCFA type develop. Computational simulations and effects of reversing the cuff orientation indicated a role for solid as well as fluid mechanical stresses. Inhibiting NO synthesis abolished the difference in uptake between the upstream and downstream sites. The data support the hypothesis that excessively elevated wall uptake of plasma macromolecules initiates the development of the TCFA, suggest that such uptake can result from solid and fluid mechanical stresses, and are consistent with a role for NO synthesis. Modification of wall transport properties might form the basis of novel methods for reducing plaque rupture
    • …
    corecore