1,670 research outputs found

    Sampled ensemble neutrality as a feature to classify potential structured RNAs

    Get PDF

    ExonImpact: prioritizing pathogenic alternative splicing events

    Get PDF
    Alternative splicing (AS) is a closely regulated process that allows a single gene to encode multiple protein isoforms, thereby contributing to the diversity of the proteome. Dysregulation of the splicing process has been found to be associated with many inherited diseases. However, among the pathogenic AS events, there are numerous “passenger” events whose inclusion or exclusion does not lead to significant changes with respect to protein function. In this study, we evaluate the secondary and tertiary structural features of proteins associated with disease-causing and neutral AS events, and show that several structural features are strongly associated with the pathological impact of exon inclusion. We further develop a machine-learning-based computational model, ExonImpact, for prioritizing and evaluating the functional consequences of hitherto uncharacterized AS events. We evaluated our model using several strategies including cross-validation, and data from the Gene-Tissue Expression (GTEx) and ClinVar databases. ExonImpact is freely available at http://watson.compbio.iupui.edu/ExonImpact

    Insights Into Functional Noncoding Rna Elements Through The Analysis Of Human Genetic Variation

    Get PDF
    Most of the human genome is noncoding but knowing how and when genetic variation in noncoding regions of the genome can impact biology and disease susceptibility remains challenging. Here, we apply an integrated genomics approach towards understanding and elucidating new patterns of functional genetic variation in untranslated regions of protein-coding messenger RNAs. G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. In Part 1 of this dissertation, we integrate multiple sources of genetic and genomic data to show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole genome sequences, we find evidence of strong negative selection acting on central guanines of UTR pG4s. At multiple GWAS-implicated SNPs within pG4 UTR sequences, we find robust allelic imbalance in gene expression across diverse tissue contexts in GTEx, suggesting that variants affecting G4 formation in UTRs may also contribute to phenotypic variation. Our results establish UTR G4s as important cis-regulatory elements and point to a link between disruption of UTR pG4 and disease. In Part 2 of this dissertation, we examine patterns of selective pressure in non-canonical open reading frames (ncORFs) mapped throughout the human genome. Ribosome-profiling has uncovered pervasive translation in ncORFs, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate new gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact of protein expression in human cells. Our results suggest new mechanisms relating uORF variation to reduced protein expression and demonstrate that translation at uORFs is genetically constrained in 50% of human genes. Together, these studies help emphasize the importance of noncoding RNA regulatory elements in mediating post-transcriptional regulation of gene expression and illuminate new patterns of functional variation in UTRs with human disease relevance

    Post-genomic structural analysis of single amino acid polymorphisms

    Get PDF
    Inherited genetic variation is critical in defining disease susceptibility. PDs, or pathogenic deviations, are mutations reported to be disease-causing, while SNPs, or single nucleotide polymorphisms, are understood to have a negligible effect on phenotype. With recent developments in biotechnology—most relevant being increased reliability and speed of sequencing—a wealth of information regarding SNPs and PDs has been acquired. Quite apart from the analytical challenge of analysing this information with a view to identifying novel therapies and targets for disease, the challenge of simply storing, mapping and processing these data is significant in itself. This thesis describes the development of a large-scale, automated pipeline that provides hypotheses as to what the structural effects of these genomic variations might be. This includes the development of nine new analyses. Eight of these new methods are structural, identifying mutations that disrupt various aspects of protein structure, including the interface, binding sites, folding mechanics and stability. The final new analysis is a novel method of identifying highly conserved residues from sequence. Here, the distribution of conservation scores from a multiple sequence alignment (MSA) is analysed to generate an MSA-specific threshold for high conservation. In order to construct MSAs for the sequence analysis, a novel method for identifying functionally equivalent proteins has been developed. Further, PDs and SNPs are characterised with respect to these structural analyses, and with respect to basic sequence and structural features. The findings support trends elsewhere in the literature: PDs are more often found in the core of proteins and at highly conserved sites; they most often affect the stability of protein structures; and they more often are between very different amino acids. In addition to the implications for disease therapies, these findings are informative in the more general context of protein structure

    5′-UTR RNA G-quadruplexes: translation regulation and targeting

    Get PDF
    RNA structures in the untranslated regions (UTRs) of mRNAs influence post-transcriptional regulation of gene expression. Much of the knowledge in this area depends on canonical double-stranded RNA elements. There has been considerable recent advancement of our understanding of guanine(G)-rich nucleic acids sequences that form four-stranded structures, called G-quadruplexes. While much of the research has been focused on DNA G-quadruplexes, there has recently been a rapid emergence of interest in RNA G-quadruplexes, particularly in the 5′-UTRs of mRNAs. Collectively, these studies suggest that RNA G-quadruplexes exist in the 5′-UTRs of many genes, including genes of clinical interest, and that such structural elements can influence translation. This review features the progresses in the study of 5′-UTR RNA G-quadruplex-mediated translational control. It covers computational analysis, cell-free, cell-based and chemical biology studies that have sought to elucidate the roles of RNA G-quadruplexes in both cap-dependent and -independent regulation of mRNA translation. We also discuss protein trans-acting factors that have been implicated and the evidence that such RNA motifs have potential as small molecule target. Finally, we close the review with a perspective on the future challenges in the field of 5′-UTR RNA G-quadruplex-mediated translation regulation
    corecore