653 research outputs found

    Increased incidence of rare codon clusters at 5' and 3' gene termini:implications for function

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The process of translation can be affected by the use of rare versus common codons within the mRNA transcript.</p> <p>Results</p> <p>Here, we show that rare codons are enriched at the 5' and 3' termini of genes from <it>E. coli </it>and other prokaryotes. Genes predicted to be secreted show significant enrichment in 5' rare codon clusters, but not 3' rare codon clusters. Surprisingly, no correlation between 5' mRNA structure and rare codon usage was observed.</p> <p>Conclusions</p> <p>Potential functional roles for the enrichment of rare codons at terminal positions are explored.</p

    GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease

    Get PDF
    The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain's tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies

    Evidence for a novel coding sequence overlapping the 5'-terminal ~90 codons of the Gill-associated and Yellow head okavirus envelope glycoprotein gene

    Get PDF
    The genus Okavirus (order Nidovirales) includes a number of viruses that infect crustaceans, causing major losses in the shrimp industry. These viruses have a linear positive-sense ssRNA genome of ~26-27 kb, encoding a large replicase polyprotein that is expressed from the genomic RNA, and several additional proteins that are expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the envelope glycoprotein encoding sequence, ORF3, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF3. We propose that translation of the new ORF initiates at a conserved AUG codon separated by just 2 nt from the ORF3 AUG initiation codon, resulting in a novel 86 amino acid protein

    A comprehensive assessment of N-terminal signal peptides prediction methods

    Get PDF
    Background: Amino-terminal signal peptides (SPs) are short regions that guide the targeting of secretory proteins to the correct subcellular compartments in the cell. They are cleaved off upon the passenger protein reaching its destination. The explosive growth in sequencing technologies has led to the deposition of vast numbers of protein sequences necessitating rapid functional annotation techniques, with subcellular localization being a key feature. Of the myriad software prediction tools developed to automate the task of assigning the SP cleavage site of these new sequences, we review here, the performance and reliability of commonly used SP prediction tools. Results: The available signal peptide data has been manually curated and organized into three datasets representing eukaryotes, Gram-positive and Gram-negative bacteria. These datasets are used to evaluate thirteen prediction tools that are publicly available. SignalP (both the HMM and ANN versions) maintains consistency and achieves the best overall accuracy in all three benchmarking experiments, ranging from 0.872 to 0.914 although other prediction tools are narrowing the performance gap. Conclusion: The majority of the tools evaluated in this study encounter no difficulty in discriminating between secretory and non-secretory proteins. The challenge clearly remains with pinpointing the correct SP cleavage site. The composite scoring schemes employed by SignalP may help to explain its accuracy. Prediction task is divided into a number of separate steps, thus allowing each score to tackle a particular aspect of the prediction.12 page(s

    Prediction of Extracellular Proteases of the Human Pathogen Helicobacter pylori Reveals Proteolytic Activity of the Hp1018/19 Protein HtrA

    Get PDF
    Exported proteases of Helicobacter pylori (H. pylori) are potentially involved in pathogen-associated disorders leading to gastric inflammation and neoplasia. By comprehensive sequence screening of the H. pylori proteome for predicted secreted proteases, we retrieved several candidate genes. We detected caseinolytic activities of several such proteases, which are released independently from the H. pylori type IV secretion system encoded by the cag pathogenicity island (cagPAI). Among these, we found the predicted serine protease HtrA (Hp1019), which was previously identified in the bacterial secretome of H. pylori. Importantly, we further found that the H. pylori genes hp1018 and hp1019 represent a single gene likely coding for an exported protein. Here, we directly verified proteolytic activity of HtrA in vitro and identified the HtrA protease in zymograms by mass spectrometry. Overexpressed and purified HtrA exhibited pronounced proteolytic activity, which is inactivated after mutation of Ser205 to alanine in the predicted active center of HtrA. These data demonstrate that H. pylori secretes HtrA as an active protease, which might represent a novel candidate target for therapeutic intervention strategies

    A four-helix bundle stores copper for methane oxidation

    Get PDF
    Methane-oxidising bacteria (methanotrophs) require large quantities of copper for the membrane-bound (particulate) methane monooxygenase (pMMO). Certain methanotrophs are also able to switch to using the iron-containing soluble MMO (sMMO) to catalyse methane oxidation, with this switchover regulated by copper. MMOs are Nature’s primary biological mechanism for suppressing atmospheric levels of methane, a potent greenhouse gas. Furthermore, methanotrophs and MMOs have enormous potential in bioremediation and for biotransformations producing bulk and fine chemicals, and in bioenergy, particularly considering increased methane availability from renewable sources and hydraulic fracturing of shale rock. We have discovered and characterised a novel copper storage protein (Csp1) from the methanotroph Methylosinus trichosporium OB3b that is exported from the cytosol, and stores copper for pMMO. Csp1 is a tetramer of 4-helix bundles with each monomer binding up to 13 Cu(I) ions in a previously unseen manner via mainly Cys residues that point into the core of the bundle. Csp1 is the first example of a protein that stores a metal within an established protein-folding motif. This work provides a detailed insight into how methanotrophs accumulate copper for the oxidation of methane. Understanding this process is essential if the wide-ranging biotechnological applications of methanotrophs are to be realised. Cytosolic homologues of Csp1 are present in diverse bacteria thus challenging the dogma that such organisms do not use copper in this location

    BayesMotif: de novo protein sorting motif discovery from impure datasets

    Get PDF
    Background Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. Methods We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Results Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. Conclusion We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model

    Comparative study of the extracellular proteome of Sulfolobus species reveals limited secretion

    Get PDF
    Although a large number of potentially secreted proteins can be predicted on the basis of genomic distribution of signal sequence-bearing proteins, protein secretion in Archaea has barely been studied. A proteomic inventory and comparison of the growth medium proteins in three hyperthermoacidophiles, i.e., Sulfolobus solfataricus, S. acidocaldarius and S. tokodaii, indicates that only few proteins are freely secreted into the growth medium and that the majority originates from cell envelope bound forms. In S. acidocaldarius both cell-associated and secreted α-amylase activities are detected. Inactivation of the amyA gene resulted in a complete loss of activity, suggesting that the same protein is responsible for the a-amylase activity at both locations. It is concluded that protein secretion in Sulfolobus is a limited process, and it is suggested that the S-layer may act as a barrier for the free diffusion of folded proteins into the medium

    An extracellular steric seeding mechanism for Eph-ephrin signaling platform assembly

    Get PDF
    Erythropoetin-producing hepatoma (Eph) receptors are cell-surface protein tyrosine kinases mediating cell-cell communication. Upon activation, they form signaling clusters. We report crystal structures of the full ectodomain of human EphA2 (eEphA2) both alone and in complex with the receptor-binding domain of the ligand ephrinA5 (ephrinA5 RBD). Unliganded eEphA2 forms linear arrays of staggered parallel receptors involving two patches of residues conserved across A-class Ephs. eEphA2-ephrinA5 RBD forms a more elaborate assembly, whose interfaces include the same conserved regions on eEphA2, but rearranged to accommodate ephrinA5 RBD. Cell-surface expression of mutant EphA2s showed that these interfaces are critical for localization at cell-cell contacts and activation-dependent degradation. Our results suggest a 'nucleation' mechanism whereby a limited number of ligand-receptor interactions 'seed' an arrangement of receptors which can propagate into extended signaling arrays

    A proteogenomic update to Yersinia: enhancing genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins.</p> <p>Results</p> <p>The application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, <it>Yersinia pestis KIM</it>. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other <it>Yersinia </it>genomes, correcting and enhancing their annotations.</p> <p>Conclusions</p> <p>In total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e.g. blast) or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.</p
    corecore