3,182 research outputs found

    Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins

    Get PDF
    We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era

    A novel bacterial l-arginine sensor controlling c-di-GMP levels in Pseudomonas aeruginosa

    Get PDF
    Nutrients such as amino acids play key roles in shaping the metabolism of microorganisms in natural environments and in host–pathogen interactions. Beyond taking part to cellular metabolism and to protein synthesis, amino acids are also signaling molecules able to influence group behavior in microorganisms, such as biofilm formation. This lifestyle switch involves complex metabolic reprogramming controlled by local variation of the second messenger 3′, 5′-cyclic diguanylic acid (c-di-GMP). The intracellular levels of this dinucleotide are finely tuned by the opposite activity of dedicated diguanylate cyclases (GGDEF signature) and phosphodiesterases (EAL and HD-GYP signatures), which are usually allosterically controlled by a plethora of environmental and metabolic clues. Among the genes putatively involved in controlling c-di-GMP levels in P. aeruginosa, we found that the multidomain transmembrane protein PA0575, bearing the tandem signature GGDEF-EAL, is an l-arginine sensor able to hydrolyse c-di-GMP. Here, we investigate the basis of arginine recognition by integrating bioinformatics, molecular biophysics and microbiology. Although the role of nutrients such as l-arginine in controlling the cellular fate in P. aeruginosa (including biofilm, pathogenicity and virulence) is already well established, we identified the first l-arginine sensor able to link environment sensing, c-di-GMP signaling and biofilm formation in this bacterium

    Transcriptomic analysis of polyketide synthases in a highly ciguatoxic dinoflagellate, Gambierdiscus polynesiensis and low toxicity Gambierdiscus pacificus, from French Polynesia.

    Get PDF
    Marine dinoflagellates produce a diversity of polyketide toxins that are accumulated in marine food webs and are responsible for a variety of seafood poisonings. Reef-associated dinoflagellates of the genus Gambierdiscus produce toxins responsible for ciguatera poisoning (CP), which causes over 50,000 cases of illness annually worldwide. The biosynthetic machinery for dinoflagellate polyketides remains poorly understood. Recent transcriptomic and genomic sequencing projects have revealed the presence of Type I modular polyketide synthases in dinoflagellates, as well as a plethora of single domain transcripts with Type I sequence homology. The current transcriptome analysis compares polyketide synthase (PKS) gene transcripts expressed in two species of Gambierdiscus from French Polynesia: a highly toxic ciguatoxin producer, G. polynesiensis, versus a non-ciguatoxic species G. pacificus, each assembled from approximately 180 million Illumina 125 nt reads using Trinity, and compares their PKS content with previously published data from other Gambierdiscus species and more distantly related dinoflagellates. Both modular and single-domain PKS transcripts were present. Single domain β-ketoacyl synthase (KS) transcripts were highly amplified in both species (98 in G. polynesiensis, 99 in G. pacificus), with smaller numbers of standalone acyl transferase (AT), ketoacyl reductase (KR), dehydratase (DH), enoyl reductase (ER), and thioesterase (TE) domains. G. polynesiensis expressed both a larger number of multidomain PKSs, and larger numbers of modules per transcript, than the non-ciguatoxic G. pacificus. The largest PKS transcript in G. polynesiensis encoded a 10,516 aa, 7 module protein, predicted to synthesize part of the polyether backbone. Transcripts and gene models representing portions of this PKS are present in other species, suggesting that its function may be performed in those species by multiple interacting proteins. This study contributes to the building consensus that dinoflagellates utilize a combination of Type I modular and single domain PKS proteins, in an as yet undefined manner, to synthesize polyketides

    First Insights into the Repertoire of Secretory Lectins in Rotifers

    Get PDF
    Due to their high biodiversity and adaptation to a mutable and challenging environment, aquatic lophotrochozoan animals are regarded as a virtually unlimited source of bioactive molecules. Among these, lectins, i.e., proteins with remarkable carbohydrate-recognition properties involved in immunity, reproduction, self/nonself recognition and several other biological processes, are particularly attractive targets for biotechnological research. To date, lectin research in the Lophotrochozoa has been restricted to the most widespread phyla, which are the usual targets of comparative immunology studies, such as Mollusca and Annelida. Here we provide the first overview of the repertoire of the secretory lectin-like molecules encoded by the genomes of six target rotifer species: Brachionus calyciflorus, Brachionus plicatilis, Proales similis (class Monogononta), Adineta ricciae, Didymodactylos carnosus and Rotaria sordida (class Bdelloidea). Overall, while rotifer secretory lectins display a high molecular diversity and belong to nine different structural classes, their total number is significantly lower than for other groups of lophotrochozoans, with no evidence of lineage-specific expansion events. Considering the high evolutionary divergence between rotifers and the other major sister phyla, their widespread distribution in aquatic environments and the ease of their collection and rearing in laboratory conditions, these organisms may represent interesting targets for glycobiological studies, which may allow the identification of novel carbohydrate-binding proteins with peculiar biological properties

    Automated Protein Structure Classification: A Survey

    Full text link
    Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront

    Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

    Get PDF
    Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional “semantic space.” Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space

    Plasmodium vivax Tryptophan-Rich Antigen PvTRAg33.5 Contains Alpha Helical Structure and Multidomain Architecture

    Get PDF
    Tryptophan-rich proteins from several malarial parasites have been identified where they play an important role in host-parasite interaction. Structural characterization of these proteins is needed to develop them as therapeutic targets. Here, we describe a novel Plasmodium vivax tryptophan-rich protein named PvTRAg33.5. It is expressed by blood stage(s) of the parasite and its gene contains two exons. The exon 1 encodes for a 23 amino acids long putative signal peptide which is likely to be cleaved off whereas the exon 2 encodes for the mature protein of 252 amino acids. The mature protein contains B-cell epitopes which were recognized by the human immune system during P.vivax infection. The PvTRAg33.5 contains 24 (9.5%) tryptophan residues and six motifs whose patterns were similar among tryptophan-rich proteins. The modeled structure of the PvTRAg33.5 consists of a multidomain architecture which is stabilized by the presence of large number of tryptophan residues. The recombinant PvTRAg33.5 showed predominantly α helical structure and alpha helix to beta sheet transition at pH below 4.5. Protein acquires an irreversible non-native state at temperature more than 50°C at neutral pH. Its secondary and tertiary structures remain stable in the presence of 35% alcohol but these structures are destabilized at higher alcohol concentrations due to the disturbance of hydrophobic interactions between tryptophanyl residues. These structural changes in the protein might occur during its translocation to interact with other proteins at its final destination for biological function such as erythrocyte invasion
    • …
    corecore