53 research outputs found
Control of Gene Expression by RNA Binding Protein Action on Alternative Translation Initiation Sites
<div><p>Transcript levels do not faithfully predict protein levels, due to post-transcriptional regulation of gene expression mediated by RNA binding proteins (RBPs) and non-coding RNAs. We developed a multivariate linear regression model integrating RBP levels and predicted RBP-mRNA regulatory interactions from matched transcript and protein datasets. RBPs significantly improved the accuracy in predicting protein abundance of a portion of the total modeled mRNAs in three panels of tissues and cells and for different methods employed in the detection of mRNA and protein. The presence of upstream translation initiation sites (uTISs) at the mRNA 5’ untranslated regions was strongly associated with improvement in predictive accuracy. On the basis of these observations, we propose that the recently discovered widespread uTISs in the human genome can be a previously unappreciated substrate of translational control mediated by RBPs.</p></div
Scheme of tests assessing statistical significance of the accuracy of RBP<sup>plus</sup> model to predict protein abundance and association of accuracy with genomic features.
<p>For each gene, 1000 randomized versions of the RBP<sup>plus</sup> model were obtained either by permuting the RBP protein levels across samples (left side), or by randomly sampling a number of protein predictors equal to the number of actual RBPs inferred to bind the mRNA UTRs (right side). The two randomization tests were run in parallel for each gene. Each randomized model was fitted with Ridge penalized linear regression using nested cross-validation (CV). In the nested cross-validation scheme, test samples are held out for accuracy estimation in the outer layer of CV, and penalty parameters are tuned in the inner layer of CV within training samples only. The p-value of the RBP<sup>plus</sup> model of each gene was defined by the probability of sampling a R<sup>2</sup> value from the empirical null distribution higher than the R<sup>2</sup> observed for the actual RBP<sup>plus</sup> model. False Discovery Rate was estimated by Storey’s q-value method.</p
Upstream translation initiation as a prominent feature of the improved predictability of protein levels from transcript levels by RBP<sup>plus</sup> models.
(A) Spearman’s correlation coefficient of the improvement in accuracy of predicted protein abundance obtained by the RBPplus model relative to the RNAonly one (R2RBPplus—R2RNAonly) with several mRNA features. Different colours denote features pertaining to the length of annotated mRNA UTRs and CDS, mRNA folding, mRNA stability, transòlation efficiency and alternative translation by upstream Translation Initiation Sites (uTISs). Analysis is conducted in the panel of human normal tissues. Stars denote statistical significance of correlation (** stands for p (B) Spearman’s correlation coefficient between improvement in accuracy of predicted protein abundance and number of uTISs localized at increasing distance upstream to the annotated TIS. Dashed line indicates the distance at which correlation becomes statistically significant. (C) uTIS-containing genes are overrepresented in the genes where RBPs improve accuracy of predicted protein abundance relative to the genes where RBPs do not. Shown is the fold enrichment observed for each panel. Stars denote Fisher’s test statistical significance (** stands for p (D) Overrepresentation is robust to the technological platform for mapping uTISs (QTI-seq in HEK293 cells). (E) The association between improvement in predictive accuracy and number of uTISs does not depend on uORFs. uORF-containing genes are not overrepresented in the genes where RBPs are informative relative to the genes where RBPs are not.</p
Data modelling workflow.
Primary data consist of three panels of quantitative transcriptome assays matched with proteome assays. Panels differ by cellular state and technological platforms for quantification of transcript and protein abundance. Data modelling is performed in parallel in the three panels. For each mRNA, we compare the accuracy of two models to predict abundance of the corresponding protein: a basic model (RNAonly) that predicts level of the protein from its mRNA level only, and a RBP-inclusive model (RBPplus) containing additional candidate predictors defined by protein levels of the RBPs which were inferred by sequence specificity to bind the mRNA UTRs. Data used in each type of model are visualized with matrices where samples (S) are shown by row and predictors (mRNA, RBP protein levels) by column. Accuracy of predicted protein abundance was assessed by k-fold (k = 5) cross-validation.</p
Post-transcriptional features quantified in modeled genes.
<p>Post-transcriptional features quantified in modeled genes.</p
Inferred RBP-mRNA interactions improve accuracy in predicting protein abundance of a portion of the total modeled mRNAs in three panels of tissues and cell lines.
<p>While RBP<sup>plus</sup> models improve accuracy (R<sup>2</sup>) in predicted protein abundance over RNA<sup>only</sup> models, improvements attained by RBPs were not distinguishable from those by randomly sampled proteins, for the majority of genes considered in the three panels. The proportion of genes where actual RBPs produced higher accuracy than random protein predictors (q < 0.05) increases from 0.65% in the NCI-60 panel to 4.2% in the normal tissue panel. <b>(A)</b> Distribution of R<sup>2</sup> coefficients for the actual RNA<sup>only</sup> and RBP<sup>plus</sup> models as well as for the RBP<sup>plus</sup> models randomized either by permuting sample labels (RBP<sup>plus</sup><sub>r.by.sample</sub>) or by randomly sampling proteins in place of actual RBPs (RBP<sup>plus</sup><sub>r.by.RBP</sub>). <b>(B)</b> Histogram of statistical significance estimates for the RBP<sup>plus</sup> models which were obtained randomizing the actual RBP<sup>plus</sup> models by randomly sampling proteins. <b>(C)</b> Histogram of statistical significance estimates for the RBP<sup>plus</sup> models which were obtained randomizing the actual RBP<sup>plus</sup> models by permuting sample labels. Dashed line corresponds to the number of genes expected in each bin under the assumption of a uniform distribution.</p
Prioritization of RNA binding proteins.
<p>Candidate RBPs are identified analysing the binding sites of each RBP in the RBP binding sites situated nearest to the uTISs of the mRNAs where the RBP<sup>plus</sup> model improves accuracy pf predicted protein abundance. <b>(A)</b> The heat map displays the percentages of genes where each RBP showed the minimal distance between a RBP binding site and an uTIS. <b>(B)</b> The inset displays the criterion of minimal distance between RBP binding sites and uTISs used to identify RBPs. RBPs are shown if they resulted to recognize the binding sites closest to the uTISs of mRNAs in at least one of the three panels.</p
Principal Component Analysis of the Conformational Freedom within the EF-Hand Superfamily
A database of nonredundant structures of EF-hand domainsi.e., pairs of helix-loop-helix motifshas
been assembled, and the six angles among the four helices re-determined. A principal component
analysis of these angles allows us to use two such components (PC1 and PC2) to describe the system
retaining 80% of the total variance. A PC2 against PC1 plot representation allows us to represent in a
compact way the full range of structural diversity of EF-hand domains, their grouping into protein
families, and the variation for each family upon calcium and peptide binding.
Keywords: EF-hand • principal component analysis • calcium binding • interhelical angle • structural analysi
Principal Component Analysis of the Conformational Freedom within the EF-Hand Superfamily
A database of nonredundant structures of EF-hand domainsi.e., pairs of helix-loop-helix motifshas
been assembled, and the six angles among the four helices re-determined. A principal component
analysis of these angles allows us to use two such components (PC1 and PC2) to describe the system
retaining 80% of the total variance. A PC2 against PC1 plot representation allows us to represent in a
compact way the full range of structural diversity of EF-hand domains, their grouping into protein
families, and the variation for each family upon calcium and peptide binding.
Keywords: EF-hand • principal component analysis • calcium binding • interhelical angle • structural analysi
Principal Component Analysis of the Conformational Freedom within the EF-Hand Superfamily
A database of nonredundant structures of EF-hand domainsi.e., pairs of helix-loop-helix motifshas
been assembled, and the six angles among the four helices re-determined. A principal component
analysis of these angles allows us to use two such components (PC1 and PC2) to describe the system
retaining 80% of the total variance. A PC2 against PC1 plot representation allows us to represent in a
compact way the full range of structural diversity of EF-hand domains, their grouping into protein
families, and the variation for each family upon calcium and peptide binding.
Keywords: EF-hand • principal component analysis • calcium binding • interhelical angle • structural analysi
- …
