115 research outputs found

    MemBrain: Improving the Accuracy of Predicting Transmembrane Helices

    Get PDF
    Prediction of transmembrane helices (TMH) in α helical membrane proteins provides valuable information about the protein topology when the high resolution structures are not available. Many predictors have been developed based on either amino acid hydrophobicity scale or pure statistical approaches. While these predictors perform reasonably well in identifying the number of TMHs in a protein, they are generally inaccurate in predicting the ends of TMHs, or TMHs of unusual length. To improve the accuracy of TMH detection, we developed a machine-learning based predictor, MemBrain, which integrates a number of modern bioinformatics approaches including sequence representation by multiple sequence alignment matrix, the optimized evidence-theoretic K-nearest neighbor prediction algorithm, fusion of multiple prediction window sizes, and classification by dynamic threshold. MemBrain demonstrates an overall improvement of about 20% in prediction accuracy, particularly, in predicting the ends of TMHs and TMHs that are shorter than 15 residues. It also has the capability to detect N-terminal signal peptides. The MemBrain predictor is a useful sequence-based analysis tool for functional and structural characterization of helical membrane proteins; it is freely available at http://chou.med.harvard.edu/bioinf/MemBrain/

    Taxonomic distribution and origins of the extended LHC (light-harvesting complex) antenna protein superfamily

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The extended light-harvesting complex (LHC) protein superfamily is a centerpiece of eukaryotic photosynthesis, comprising the LHC family and several families involved in photoprotection, like the LHC-like and the photosystem II subunit S (PSBS). The evolution of this complex superfamily has long remained elusive, partially due to previously missing families.</p> <p>Results</p> <p>In this study we present a meticulous search for LHC-like sequences in public genome and expressed sequence tag databases covering twelve representative photosynthetic eukaryotes from the three primary lineages of plants (Plantae): glaucophytes, red algae and green plants (Viridiplantae). By introducing a coherent classification of the different protein families based on both, hidden Markov model analyses and structural predictions, numerous new LHC-like sequences were identified and several new families were described, including the red lineage chlorophyll <it>a/b</it>-binding-like protein (RedCAP) family from red algae and diatoms. The test of alternative topologies of sequences of the highly conserved chlorophyll-binding core structure of LHC and PSBS proteins significantly supports the independent origins of LHC and PSBS families via two unrelated internal gene duplication events. This result was confirmed by the application of cluster likelihood mapping.</p> <p>Conclusions</p> <p>The independent evolution of LHC and PSBS families is supported by strong phylogenetic evidence. In addition, a possible origin of LHC and PSBS families from different homologous members of the stress-enhanced protein subfamily, a diverse and anciently paralogous group of two-helix proteins, seems likely. The new hypothesis for the evolution of the extended LHC protein superfamily proposed here is in agreement with the character evolution analysis that incorporates the distribution of families and subfamilies across taxonomic lineages. Intriguingly, stress-enhanced proteins, which are universally found in the genomes of green plants, red algae, glaucophytes and in diatoms with complex plastids, could represent an important and previously missing link in the evolution of the extended LHC protein superfamily.</p

    A Combination of Compositional Index and Genetic Algorithm for Predicting Transmembrane Helical Segments

    Get PDF
    Transmembrane helix (TMH) topology prediction is becoming a focal problem in bioinformatics because the structure of TM proteins is difficult to determine using experimental methods. Therefore, methods that can computationally predict the topology of helical membrane proteins are highly desirable. In this paper we introduce TMHindex, a method for detecting TMH segments using only the amino acid sequence information. Each amino acid in a protein sequence is represented by a Compositional Index, which is deduced from a combination of the difference in amino acid occurrences in TMH and non-TMH segments in training protein sequences and the amino acid composition information. Furthermore, a genetic algorithm was employed to find the optimal threshold value for the separation of TMH segments from non-TMH segments. The method successfully predicted 376 out of the 378 TMH segments in a dataset consisting of 70 test protein sequences. The sensitivity and specificity for classifying each amino acid in every protein sequence in the dataset was 0.901 and 0.865, respectively. To assess the generality of TMHindex, we also tested the approach on another standard 73-protein 3D helix dataset. TMHindex correctly predicted 91.8% of proteins based on TM segments. The level of the accuracy achieved using TMHindex in comparison to other recent approaches for predicting the topology of TM proteins is a strong argument in favor of our proposed method. Availability: The datasets, software together with supplementary materials are available at: http://faculty.uaeu.ac.ae/nzaki/TMHindex.htm

    The capsule polysaccharide structure and biogenesis for non-O1 Vibrio cholerae NRT36S: genes are embedded in the LPS region

    Get PDF
    BACKGROUND: In V. cholerae, the biogenesis of capsule polysaccharide is poorly understood. The elucidation of capsule structure and biogenesis is critical to understanding the evolution of surface polysaccharide and the internal relationship between the capsule and LPS in this species. V. cholerae serogroup O31 NRT36S, a human pathogen that produces a heat-stable enterotoxin (NAG-ST), is encapsulated. Here, we report the covalent structure and studies of the biogenesis of the capsule in V. cholerae NRT36S. RESULTS: The structure of the capsular (CPS) polysaccharide was determined by high resolution NMR spectroscopy and shown to be a complex structure with four residues in the repeating subunit. The gene cluster of capsule biogenesis was identified by transposon mutagenesis combined with whole genome sequencing data (GenBank accession DQ915177). The capsule gene cluster shared the same genetic locus as that of the O-antigen of lipopolysaccharide (LPS) biogenesis gene cluster. Other than V. cholerae O139, this is the first V. cholerae CPS for which a structure has been fully elucidated and the genetic locus responsible for biosynthesis identified. CONCLUSION: The co-location of CPS and LPS biosynthesis genes was unexpected, and would provide a mechanism for simultaneous emergence of new O and K antigens in a single strain. This, in turn, may be a key element for V. cholerae to evolve new strains that can escape immunologic detection by host populations

    Mycobacterium tuberculosis DosR Regulon Gene Rv0079 Encodes a Putative, ‘Dormancy Associated Translation Inhibitor (DATIN)’

    Get PDF
    Mycobacterium tuberculosis is a major human pathogen that has evolved survival mechanisms to persist in an immune-competent host under a dormant condition. The regulation of M. tuberculosis metabolism during latent infection is not clearly known. The dormancy survival regulon (DosR regulon) is chiefly responsible for encoding dormancy related functions of M. tuberculosis. We describe functional characterization of an important gene of DosR regulon, Rv0079, which appears to be involved in the regulation of translation through the interaction of its product with bacterial ribosomal subunits. The protein encoded by Rv0079, possibly, has an inhibitory role with respect to protein synthesis, as revealed by our experiments. We performed computational modelling and docking simulation studies involving the protein encoded by Rv0079 followed by in vitro translation and growth curve analysis experiments, involving recombinant E. coli and Bacille Calmette Guérin (BCG) strains that overexpressed Rv0079. Our observations concerning the interaction of the protein with the ribosomes are supportive of its role in regulation/inhibition of translation. We propose that the protein encoded by locus Rv0079 is a ‘dormancy associated translation inhibitor’ or DATIN

    Transcript Expression Analysis of Putative Trypanosoma brucei GPI-Anchored Surface Proteins during Development in the Tsetse and Mammalian Hosts

    Get PDF
    Human African Trypanosomiasis is a devastating disease caused by the parasite Trypanosoma brucei. Trypanosomes live extracellularly in both the tsetse fly and the mammal. Trypanosome surface proteins can directly interact with the host environment, allowing parasites to effectively establish and maintain infections. Glycosylphosphatidylinositol (GPI) anchoring is a common posttranslational modification associated with eukaryotic surface proteins. In T. brucei, three GPI-anchored major surface proteins have been identified: variant surface glycoproteins (VSGs), procyclic acidic repetitive protein (PARP or procyclins), and brucei alanine rich proteins (BARP). The objective of this study was to select genes encoding predicted GPI-anchored proteins with unknown function(s) from the T. brucei genome and characterize the expression profile of a subset during cyclical development in the tsetse and mammalian hosts. An initial in silico screen of putative T. brucei proteins by Big PI algorithm identified 163 predicted GPI-anchored proteins, 106 of which had no known functions. Application of a second GPI-anchor prediction algorithm (FragAnchor), signal peptide and trans-membrane domain prediction software resulted in the identification of 25 putative hypothetical proteins. Eighty-one gene products with hypothetical functions were analyzed for stage-regulated expression using semi-quantitative RT-PCR. The expression of most of these genes were found to be upregulated in trypanosomes infecting tsetse salivary gland and proventriculus tissues, and 38% were specifically expressed only by parasites infecting salivary gland tissues. Transcripts for all of the genes specifically expressed in salivary glands were also detected in mammalian infective metacyclic trypomastigotes, suggesting a possible role for these putative proteins in invasion and/or establishment processes in the mammalian host. These results represent the first large-scale report of the differential expression of unknown genes encoding predicted T. brucei surface proteins during the complete developmental cycle. This knowledge may form the foundation for the development of future novel transmission blocking strategies against metacyclic parasites

    BB0172, a Borrelia burgdorferi Outer Membrane Protein That Binds Integrin Α3Β1

    Get PDF
    Lyme disease is a multisystemic disorder caused by Borrelia burgdorferi infection. Upon infection, some B. burgdorferi genes are upregulated, including members of the microbial surface components recognizing adhesive matrix molecule (MSCRAMM) protein family, which facilitate B. burgdorferi adherence to extracellular matrix components of the host. Comparative genome analysis has revealed a new family of B. burgdorferi proteins containing the von Willebrand factor A (vWFA) domain. In the present study, we characterized the expression and membrane association of the vWFA domain-containing protein BB0172 by using in vitro transcription/translation systems in the presence of microsomal membranes and with detergent phase separation assays. Our results showed evidence of BB0172 localization in the outer membrane, the orientation of the vWFA domain to the extracellular environment, and its function as a metal ion-dependent integrin-binding protein. This is the first report of a borrelial adhesin with a metal ion-dependent adhesion site (MIDAS) motif that is similar to those observed in eukaryotic integrins and has a similar function

    More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology

    Get PDF
    Large-scale genome sequencing gained general importance for life science because functional annotation of otherwise experimentally uncharacterized sequences is made possible by the theory of biomolecular sequence homology. Historically, the paradigm of similarity of protein sequences implying common structure, function and ancestry was generalized based on studies of globular domains. Having the same fold imposes strict conditions over the packing in the hydrophobic core requiring similarity of hydrophobic patterns. The implications of sequence similarity among non-globular protein segments have not been studied to the same extent; nevertheless, homology considerations are silently extended for them. This appears especially detrimental in the case of transmembrane helices (TMs) and signal peptides (SPs) where sequence similarity is necessarily a consequence of physical requirements rather than common ancestry. Thus, matching of SPs/TMs creates the illusion of matching hydrophobic cores. Therefore, inclusion of SPs/TMs into domain models can give rise to wrong annotations. More than 1001 domains among the 10,340 models of Pfam release 23 and 18 domains of SMART version 6 (out of 809) contain SP/TM regions. As expected, fragment-mode HMM searches generate promiscuous hits limited to solely the SP/TM part among clearly unrelated proteins. More worryingly, we show explicit examples that the scores of clearly false-positive hits, even in global-mode searches, can be elevated into the significance range just by matching the hydrophobic runs. In the PIR iProClass database v3.74 using conservative criteria, we find that at least between 2.1% and 13.6% of its annotated Pfam hits appear unjustified for a set of validated domain models. Thus, false-positive domain hits enforced by SP/TM regions can lead to dramatic annotation errors where the hit has nothing in common with the problematic domain model except the SP/TM region itself. We suggest a workflow of flagging problematic hits arising from SP/TM-containing models for critical reconsideration by annotation users

    Topological Analysis of Small Leucine-Rich Repeat Proteoglycan Nyctalopin

    Get PDF
    Nyctalopin is a small leucine rich repeat proteoglycan (SLRP) whose function is critical for normal vision. The absence of nyctalopin results in the complete form of congenital stationary night blindness. Normally, glutamate released by photoreceptors binds to the metabotropic glutamate receptor type 6 (GRM6), which through a G-protein cascade closes the non-specific cation channel, TRPM1, on the dendritic tips of depolarizing bipolar cells (DBCs) in the retina. Nyctalopin has been shown to interact with TRPM1 and expression of TRPM1 on the dendritic tips of the DBCs is dependent on nyctalopin expression. In the current study, we used yeast two hybrid and biochemical approaches to investigate whether murine nyctalopin was membrane bound, and if so by what mechanism, and also whether the functional form was as a homodimer. Our results show that murine nyctalopin is anchored to the plasma membrane by a single transmembrane domain, such that the LRR domain is located in the extracellular space

    Membrane Topology and Predicted RNA-Binding Function of the ‘Early Responsive to Dehydration (ERD4)’ Plant Protein

    Get PDF
    Functional annotation of uncharacterized genes is the main focus of computational methods in the post genomic era. These tools search for similarity between proteins on the premise that those sharing sequence or structural motifs usually perform related functions, and are thus particularly useful for membrane proteins. Early responsive to dehydration (ERD) genes are rapidly induced in response to dehydration stress in a variety of plant species. In the present work we characterized function of Brassica juncea ERD4 gene using computational approaches. The ERD4 protein of unknown function possesses ubiquitous DUF221 domain (residues 312–634) and is conserved in all plant species. We suggest that the protein is localized in chloroplast membrane with at least nine transmembrane helices. We detected a globular domain of 165 amino acid residues (183–347) in plant ERD4 proteins and expect this to be posited inside the chloroplast. The structural-functional annotation of the globular domain was arrived at using fold recognition methods, which suggested in its sequence presence of two tandem RNA-recognition motif (RRM) domains each folded into βαββαβ topology. The structure based sequence alignment with the known RNA-binding proteins revealed conservation of two non-canonical ribonucleoprotein sub-motifs in both the putative RNA-recognition domains of the ERD4 protein. The function of highly conserved ERD4 protein may thus be associated with its RNA-binding ability during the stress response. This is the first functional annotation of ERD4 family of proteins that can be useful in designing experiments to unravel crucial aspects of stress tolerance mechanism
    corecore