638 research outputs found
Structure-based whole-genome realignment reveals many novel noncoding RNAs
Recent genome-wide computational screens that search for conservation of RNA secondary structure in whole-genome alignments (WGAs) have predicted thousands of structural noncoding RNAs (ncRNAs). The sensitivity of such approaches, however, is limited, due to their reliance on sequence-based whole-genome aligners, which regularly misalign structural ncRNAs. This suggests that many more structural ncRNAs may remain undetected. Structure-based alignment, which could increase the sensitivity, has been prohibitive for genome-wide screens due to its extreme computational costs. Breaking this barrier, we present the pipeline REAPR (RE-Alignment for Prediction of structural ncRNA), which efficiently realigns whole genomes based on RNA sequence and structure, thus allowing us to boost the performance of de novo ncRNA predictors, such as RNAz. Key to the pipeline's efficiency is the development of a novel banding technique for multiple RNA alignment. REAPR significantly outperforms the widely used predictors RNAz and EvoFold in genome-wide screens; in direct comparison to the most recent RNAz screen on D. melanogaster, REAPR predicts twice as many high-confidence ncRNA candidates. Moreover, modENCODE RNA-seq experiments confirm a substantial number of its predictions as transcripts. REAPR's advancement of de novo structural characterization of ncRNAs complements the identification of transcripts from rapidly accumulating RNA-seq data.National Institutes of Health (U.S.) (Grant RO1GM081871
Recommended from our members
Non-Coding RNAs Play Significant Roles in Host-Virus Interactions
Previous Dicer immunoprecipitation (IP) discovered an RNA polyphosphatase PIR-1 interacting with Dicer, may participate in RNAi but the mechanism is unrevealed. Here we demonstrate that C. elegans PIR-1 is involved in the RNAi-mediated silencing of Orsay virus via promoting the biogenesis of 23-mer RNAs and the loading of 23-mer RNAs to RDE-1. We also showed that PIR-1 acts as a de facto RNA phosphatase in vivo to regulate triphosphorylated RNAs (ppp-RNAs). Thus, PIR-1 is a conserved master regulator of ppp-RNAs and plays important roles in silencing viral ppp-RNAs and modifying cellular ppp-RNAs.Next we apply PIR-1 in our small RNA cloning strategy. The high-throughput sequencing has become a standard tool for analyzing RNA and DNA. We have developed a new strategy to clone modified/unmodified small RNA in an all-liquid-based reaction carried out in a single PCR tube with as little as 16 ng total RNA. The 7-hour cloning process only needs ~1-hour labor. Moreover, this method can also clone mRNA, simplifying the need to prepare two cloning systems for small RNA and mRNA.At last, we study the function of non-coding RNA in influenza A virus. It utilizes a special process, cap-snatching, to obtain a host capped small RNA for priming viral mRNA synthesis, generating hybrid capped mRNA for translation. Previous studies have been focusing on cap-snatching at thevii5' end of viral mRNA. Here we report two non-canonical cap-snatching regions: one 300-nt upstream of the 3' end of each mRNA generating capped mRNA/ncRNA, and the other in the 5' region of vRNA and mapped primarily at the 2-nt, likely generating ncRNA. We also demonstrate that the influenza virus snatches virus-derived capped RNA in addition to host capped RNA. These findings expand our understanding of the cap-snatching mechanism and suggest that the influenza A virus may utilize this process to diversify its mRNA/ncRNA
Computational analysis of noncoding RNAs
Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.Austrian Science Fund (Schrodinger Fellowship J2966-B12)German Research Foundation (grant WI 3628/1-1 to SW)National Institutes of Health (U.S.) (NIH award 1RC1CA147187
NcDNAlign: Plausible multiple alignments of non-protein-coding genomic sequences
Genome-wide multiple sequence alignments (MSAs) are a necessary prerequisite for an increasingly diverse collection of comparative genomic approaches. Here we present a versatile method that generates high-quality MSAs for non-protein-coding sequences. The NcDNAlign pipeline combines pairwise BLAST alignments to create initial MSAs, which are then locally improved and trimmed. The program is optimized for speed and hence is particulary well-suited to pilot studies. We demonstrate the practical use of NcDNAlign in three case studies: the search for ncRNAs in gammaproteobacteria and the analysis of conserved noncoding DNA in nematodes and teleost fish, in the latter case focusing on the fate of duplicated ultra-conserved regions. Compared to the currently widely used genome-wide alignment program TBA, our program results in a 20- to 30-fold reduction of CPU time necessary to generate gammaproteobacterial alignments. A showcase application of bacterial ncRNA prediction based on alignments of both algorithms results in similar sensitivity, false discovery rates, and up to 100 putatively novel ncRNA structures. Similar findings hold for our application of NcDNAlign to the identification of ultra-conserved regions in nematodes and teleosts. Both approaches yield conserved sequences of unknown function, result in novel evolutionary insights into conservation patterns among these genomes, and manifest the benefits of an efficient and reliable genome-wide alignment package. The software is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/NcDNAlign/
Discovering cancer-associated transcripts by RNA sequencing
High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd
Recommended from our members
Functional study of a novel missense single-nucleotide variant of NUP107 in two daughters of Mexican origin with premature ovarian insufficiency.
BackgroundHypergonadotropic hypogonadism (HH) is a genetically heterogeneous disorder that usually presents with amenorrhea, atrophic ovaries, and low estrogen. Most cases of HH are idiopathic and nonsyndromic. Nucleoporin 107 (NUP107), a protein involved in transport between cytoplasm and nucleus with putative roles in meiosis/mitosis progression, was recently implicated as a cause of HH. We identified a NUP107 genetic variant in a nonconsanguineous family with two sisters affected with primary amenorrhea and HH, and generated a mouse model that carried the human variant.MethodsWe performed a high-resolution X-chromosome microarray and whole exome sequencing on parents and two sisters with HH to identify pathogenic variants. We generated a mouse model of candidate NUP107 variant using CRISPR/Cas9.ResultsWhole exome sequencing identified a novel and rare missense variant in the NUP107 gene (c.1063C>T, p.R355C) in both sisters with HH. In order to determine functional significance of this variant, we used CRISPR/Cas9 to introduce the human variant into the mouse genome. Mice with the homolog of the R355C variant, as well as the nine base pairs deletion in Nup107 had female subfertility.ConclusionsOur findings indicate that NUP107 R355C variant falls in the category of variant of unknown significance as the cause of HH and infertility
Analysis of anopheline mosquito behavior and identification of vector control targets in the post-genomic era
Thesis advisor: Marc A.T. MuskavitchThe protozoan Plasmodium falciparum, the mosquito-borne pathogen that causes human malaria, remains one of the most difficult infectious parasites to combat and control. Campaigns against malaria eradication have succeeded, in most instances, at the level of vector control, rather than from initiatives that have attempted to decrease malaria burden by targeting parasites. The rapid evolution and spread of insecticide-resistant mosquitoes is threatening our ability to combat vectors and control malaria. Therefore, the development, procurement and distribution of new methods of vector control are paramount. Two aspects of vector biology that can be exploited toward these ends are vector behaviors and vector-specific insecticide targets. In this thesis, I describe three aspects of vector biology with potential for the development of improved means of vector control: photopreference behavior, long non-coding RNA (lncRNA) targets and epigenetic gene ensemble targets. My studies of photopreference have revealed that specific mosquito species within the genus Anopheles, An. gambiae and An. stephensi, exhibit different photopreference behaviors, and that each gender of mosquito in these species exhibits distinct light-dependent resting behaviors. These inter-specific behavioral differences may be affected by differing numbers of long-wavelength sensing Opsin genes in each species, and my findings regarding species-specific photopreferences suggest that some behavioral interventions may need to be tailored for specific vector mosquito species. Based on the advancement of next-generation sequencing technologies and the generation by others of assembled genomes of many anopheline mosquito species, I have identified a comprehensive set of approximately 3,000 lncRNAs and find that RNA secondary structures are notably conserved within the gambiae species complex. As lncRNAs and epigenetic modifiers cooperate to modulate epigenetic regulation, I have also analyzed the conservation of epigenetic gene ensembles across a number of anopheline species, based on identification of homologous epigenetic ensemble genes in An. gambiae compared to Drosophila melanogaster. Further analyses of these ensembles illustrate that these epigenetic genes are highly stable among many anopheline species, in that I detect only eight gene family expansion or contraction events among 169 epigenetic ensemble genes within a set of 12 anopheline species. My hope is that my findings will enable deeper investigations of many behavioral and epigenetic processes in Anopheles gambiae and other anopheline vector mosquitoes and thereby enable the development of new, more effective means of vector and malaria control.Thesis (PhD) — Boston College, 2015.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology
- …