3,904 research outputs found

    Quantifying variances in comparative RNA secondary structure prediction

    Get PDF
    BackgroundWith the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances.ResultsIn this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the “reliability score” reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments.ConclusionsOur predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself

    Maintaining protein localization, structure, and functional interactions via codon usage and coevolution of gene expression: Combining evolutionary bioinformatics with omics-scale data to test hypotheses related to protein function

    Get PDF
    A major challenge of the omics-era is identifying how a protein functions, both in terms of its specific function and within the context of the various biological processes necessary for the cell\u27s survival. Key elements necessary for a protein to perform its function are efficient and accurate protein localization, protein folding, and interactions with other proteins. Previous work implicated codon usage as a means to modulate protein localization and folding. Using a mechanistic model rooted in population genetics, I examine potential selective differences in codon usage in signal peptides (localization) and protein secondary structures. Although previous work argued signal peptides were under selection for increased translation inefficiency, I find selection is generally consistent with the 5\u27-regions of non-secreted proteins. I also find that previous work was likely confounded by biases in signal peptide amino acid usage and gene expression. Although the direction of selection on codon usage is mostly consistent between protein secondary structures, the strength of this selection does vary for certain codons. After successful folding and localization of a protein, it must be able to function within the context of other proteins in the cell, often through protein-protein interactions of metabolic pathways. Previous work suggests proteins which are part of the same functional processes within a cell are co-expressed across time and environmental conditions. Using the concept of guilt-by-association, I combine empirical protein abundances (measured via mass spectrometry) with sequence homology based function prediction tools to identify potential functions of proteins of unknown function in \textit{C. thermocellum}. Building upon the concept that functionally-related genes are co-expressed within a species, I demonstrate how phylogenetic comparative methods can be used to detect signals of gene expression coevolution across species while accounting for the shared ancestry of the species in question

    Distribution of mutational fitness effects and of epistasis in the 5' untranslated region of a plant RNA virus

    Get PDF
    [Background[ Understanding the causes and consequences of phenotypic variability is a central topic of evolutionary biology. Mutations within non-coding cis-regulatory regions are thought to be of major effect since they affect the expression of downstream genes. To address the evolutionary potential of mutations affecting such regions in RNA viruses, we explored the fitness properties of mutations affecting the 5’-untranslated region (UTR) of a prototypical member of the picorna-like superfamily, tobacco etch virus (TEV). This 5’ UTR acts as an internal ribosomal entry site (IRES) and is essential for expression of all viral genes.[Results] We determined in vitro the folding of 5’ UTR using the selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE) technique. Then, we created a collection of single-nucleotide substitutions on this region and evaluated the statistical properties of their fitness effects in vivo. We found that, compared to random mutations affecting coding sequences, mutations at the 5’ UTR were of weaker effect. We also created double mutants by combining pairs of these single mutations and found variation in the magnitude and sign of epistatic interactions, with an enrichment of cases of positive epistasis. A correlation exists between the magnitude of fitness effects and the size of the perturbation made in the RNA folding structure, suggesting that the larger the departure from the predicted fold, the more negative impact in viral fitness.[Conclusions] Evidence that mutational fitness effects on the short 5’ UTR regulatory sequence of TEV are weaker than those affecting its coding sequences have been found. Epistasis among pairs of mutations on the 5’ UTR ranged between the extreme cases of synthetic lethal and compensatory. A plausible hypothesis to explain all these observations is that the interaction between the 5’ UTR and the host translational machinery was shaped by natural selection to be robust to mutations, thus ensuring the homeostatic expression of viral genes even at high mutation rates.This work was supported by grant BFU2012-30805 from the Spanish Ministry of Economy and Competitiveness (MINECO), grant PROMETEOII/2014/021 from Generalitat Valenciana and the EvoEvo (ICT610427) project from the European Commission 7th Framework Program. Publication fees have been partially paid by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).Peer reviewe

    Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    Get PDF
    A distance constrained secondary structural model of the ≈10 kb RNA genome of the HIV-1 has been predicted but higher-order structures, involving long distance interactions, are currently unknown. We present the first global RNA secondary structure model for the HIV-1 genome, which integrates both comparative structure analysis and information from experimental data in a full-length prediction with-out distance constraints. Besides recovering known structural elements, we predict several novel struc-tural elements that are conserved in HIV-1 evolution. Our results also indicate that the structure of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA sec-ondary structures. Most interesting, a set of long distance interactions form a core organizing struc-ture (COS) that organize the genome into three ma-jor structural domains. Despite overlapping protein-coding regions the COS is supported by a particu-lar high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential roles in replication and evolution for the virus

    Comparative Transcriptomic Analysis and Structure Prediction of Novel Newt Proteins

    Get PDF
    Notophthalmus viridescens (Red-spotted Newt) possess amazing capabilities to regenerate their organs and other tissues. Previously, using a de novo assembly of the newt transcriptome combined with proteomic validation, our group identified a novel family of five protein members expressed in adult tissues during regeneration in Notophthalmus viridescens. The presence of a putative signal peptide suggests that all these proteins are secretory in nature. Here we employed iterative threading assembly refinement (I-TASSER) server to generate three-dimensional structure of these novel Newt proteins and predicted their function. Our data suggests that these proteins could act as ion transporters, and be involved in redox reaction(s). Due to absence of transgenic approaches in N. viridescens, and conservation of genetic machinery across species, we generated transgenic Drosophila melanogaster to misexpress these genes. Expression of 2775 transcripts were compared between these five newly identified Newt genes. We found that genes involved in the developmental process, cell cycle, apoptosis, and immune response are among those that are highly enriched. To validate the RNA Seq. data, expression of six highly regulated genes were verified using real time Quantitative Polymerase Chain Reaction (RT-qPCR). These graded gene expression patterns provide insight into the function of novel protein family identified in Newt, and layout a map for future studies in the field
    corecore