354,652 research outputs found
ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network
With the development of next generation sequencing techniques, it is fast and
cheap to determine protein sequences but relatively slow and expensive to
extract useful information from protein sequences because of limitations of
traditional biological experimental techniques. Protein function prediction has
been a long standing challenge to fill the gap between the huge amount of
protein sequences and the known function. In this paper, we propose a novel
method to convert the protein function problem into a language translation
problem by the new proposed protein sequence language "ProLan" to the protein
function language "GOLan", and build a neural machine translation model based
on recurrent neural networks to translate "ProLan" language to "GOLan"
language. We blindly tested our method by attending the latest third Critical
Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the
performance of our methods on selected proteins whose function was released
after CAFA competition. The good performance on the training and testing
datasets demonstrates that our new proposed method is a promising direction for
protein function prediction. In summary, we first time propose a method which
converts the protein function prediction problem to a language translation
problem and applies a neural machine translation model for protein function
prediction.Comment: 13 pages, 5 figure
Mass spectrometry and ribosome profiling, a perfect combination towards a more comprehensive identification strategy of true in vivo protein forms
An increasing number of studies involve integrative analysis of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS). Recently, a strategy, termed ribosome profiling, based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. In contrast to routinely employed protein databases in proteomics searches, RIBO-seq derived data gives a more representative expression state and accounts for sequence variation information and alternative translation initiation.
To verify the potential of ribosome profiling in providing us with a true snapshot of the translational landscape, we devised a proteogenomic approach generating a database of translation products based on ribosome profiling experiments. The raw and untreated RIBO-seq data is analyzed for both splice isoforms and single nucleotide polymorphisms, as such taking into account transcriptional variation. Next to that, RIBO-seq data for translation start site discovery (treated with harringtonine, lactomidomycin or puromycin) is used to obtain a genome wide blueprint of all possible translation initiation sites and as such taking into account translation variation. By adding protein-DB annotation to the genomic RIBO-seq derived data and after in silico translation a protein database is constructed reflecting the full complexity of the proteome.
Using a first version of our proteogenomic approach on an undifferentiated mouse embryonic stem cell line (E14) we could demonstrate an increase of the overall protein identification rate with 2.5% as compared to only searching UniProtKB-SwissProt. Furthermore, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated uORFs
PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration
An increasing amount of studies integrate mRNA sequencing data into MS-based proteomics to complement the translation product search space. However, several factors, including extensive regulation of mRNA translation and the need for three- or six-frame-translation, impede the use of mRNA-seq data for the construction of a protein sequence search database. With that in mind, we developed the PROTEOFORMER tool that automatically processes data of the recently developed ribosome profiling method (sequencing of ribosome-protected mRNA fragments), resulting in genome-wide visualization of ribosome occupancy. Our tool also includes a translation initiation site calling algorithm allowing the delineation of the open reading frames (ORFs) of all translation products. A complete protein synthesis-based sequence database can thus be compiled for mass spectrometry-based identification. This approach increases the overall protein identification rates with 3% and 11% (improved and new identifications) for human and mouse, respectively, and enables proteome-wide detection of 5'-extended proteoforms, upstream ORF translation and near-cognate translation start sites. The PROTEOFORMER tool is available as a stand-alone pipeline and has been implemented in the galaxy framework for ease of use
Cellular mRNAs access second ORFs using a novel amino acid sequence-dependent coupled translation termination-reinitiation mechanism
Polycistronic transcripts are considered rare in the human genome. Initiation of translation of internal ORFs of eukaryotic genes has been shown to use either leaky scanning or highly structured IRES regions to access initiation codons. Studies on mammalian viruses identified a mechanism of coupled translation termination-reinitiation that allows translation of an additional ORF. Here, the ribosome terminating translation of ORF-1 translocates upstream to reinitiate translation of ORF-2. We have devised an algorithm to identify mRNAs in the human transcriptome in which the major ORF-1 overlaps a second ORF capable of encoding a product of at least 50 aa in length. This identified 4368 transcripts representing 2214 genes. We investigated 24 transcripts, 22 of which were shown to express a protein from ORF-2 highlighting that 3' UTRs contain protein-coding potential more frequently than previously suspected. Five transcripts accessed ORF-2 using a process of coupled translation termination-reinitiation. Analysis of one transcript, encoding the CASQ2 protein, showed that the mechanism by which the coupling process of the cellular mRNAs was achieved was novel. This process was not directed by the mRNA sequence but required an aspartate-rich repeat region at the carboxyl terminus of the terminating ORF-1 protein. Introduction of wobble mutations for the aspartate codon had no effect, whereas replacing aspartate for glutamate repeats eliminated translational coupling. This is the first description of a coordinated expression of two proteins from cellular mRNAs using a coupled translation termination-reinitiation process and is the first example of such a process being determined at the amino acid level
Robust circadian clocks from coupled protein modification and transcription-translation cycles
The cyanobacterium Synechococcus elongatus uses both a protein
phosphorylation cycle and a transcription-translation cycle to generate
circadian rhythms that are highly robust against biochemical noise. We use
stochastic simulations to analyze how these cycles interact to generate stable
rhythms in growing, dividing cells. We find that a protein phosphorylation
cycle by itself is robust when protein turnover is low. For high decay or
dilution rates (and co mpensating synthesis rate), however, the
phosphorylation-based oscillator loses its integrity. Circadian rhythms thus
cannot be generated with a phosphorylation cycle alone when the growth rate,
and consequently the rate of protein dilution, is high enough; in practice, a
purely post-translational clock ceases to function well when the cell doubling
time drops below the 24 hour clock period. At higher growth rates, a
transcription-translation cycle becomes essential for generating robust
circadian rhythms. Interestingly, while a transcription-translation cycle is
necessary to sustain a phosphorylation cycle at high growth rates, a
phosphorylation cycle can dramatically enhance the robustness of a
transcription-translation cycle at lower protein decay or dilution rates. Our
analysis thus predicts that both cycles are required to generate robust
circadian rhythms over the full range of growth conditions.Comment: main text: 7 pages including 5 figures, supplementary information: 13
pages including 9 figure
Comment on "Length-dependent translation of messenger RNA by ribosomes"
In the recent paper of Valleriani {\it et al} [Phys. Rev. E {\bf 83}, 042903
(2011)], a simple model for describing the translation of messenger RNA (mRNA)
by ribosomes is presented, and an expression of the translational ratio ,
defined as the ratio of translation rate of protein from mRNA
to degradation rate of protein, is obtained. The key point to get
this ratio is to get the translation rate . In the study
of Valleriani {\it et al}, is assumed to be the mean value of
measured translation rate, i.e. the mean value of ratio of the translation
number of protein to the lifetime of mRNA. However, in experiments different
methods might be used to get . Therefore, for the sake of
future application of their model to more experimental data analysis, in this
comment three methods to get the translation rate , and
consequently the translational ratio , are provided. Based on one of the
methods which might be employed in most of the experiments, we find that the
translational ratio decays exponentially with the length of mRNA in
prokaryotic cells, and decays reciprocally with the length of mRNA in
eukaryotic cells. This result is slight different from that obtained in
Valleriani's study
Rocaglates convert DEAD-box protein eIF4A into a sequence-selective translational repressor.
Rocaglamide A (RocA) typifies a class of protein synthesis inhibitors that selectively kill aneuploid tumour cells and repress translation of specific messenger RNAs. RocA targets eukaryotic initiation factor 4A (eIF4A), an ATP-dependent DEAD-box RNA helicase; its messenger RNA selectivity is proposed to reflect highly structured 5' untranslated regions that depend strongly on eIF4A-mediated unwinding. However, rocaglate treatment may not phenocopy the loss of eIF4A activity, as these drugs actually increase the affinity between eIF4A and RNA. Here we show that secondary structure in 5' untranslated regions is only a minor determinant for RocA selectivity and that RocA does not repress translation by reducing eIF4A availability. Rather, in vitro and in cells, RocA specifically clamps eIF4A onto polypurine sequences in an ATP-independent manner. This artificially clamped eIF4A blocks 43S scanning, leading to premature, upstream translation initiation and reducing protein expression from transcripts bearing the RocA-eIF4A target sequence. In elucidating the mechanism of selective translation repression by this lead anti-cancer compound, we provide an example of a drug stabilizing sequence-selective RNA-protein interactions
Evaluation of mTOR-regulated mRNA translation.
mTOR, the mammalian target of rapamycin, regulates protein synthesis (mRNA translation) by affecting the phosphorylation or activity of several translation factors. Here, we describe methods for studying the impact of mTOR signalling on protein synthesis, using inhibitors of mTOR such as rapamycin (which impairs some of its functions) or mTOR kinase inhibitors (which probably block all functions).To assess effects of mTOR inhibition on general protein synthesis in cells, the incorporation of radiolabelled amino acids into protein is measured. This does not yield information on the effects of mTOR on the synthesis of specific proteins. To do this, two methods are described. In one, stable-isotope labelled amino acids are used, and their incorporation into new proteins is determined using mass spectrometric methods. The proportions of labelled vs. unlabeled versions of each peptide from a given protein provide quantitative information about the rate of that protein's synthesis under different conditions. Actively translated mRNAs are associated with ribosomes in polyribosomes (polysomes); thus, examining which mRNAs are found in polysomes under different conditions provides information on the translation of specific mRNAs under different conditions. A method for the separation of polysomes from non-polysomal mRNAs is describe
- …