638 research outputs found

    Structure-based whole-genome realignment reveals many novel noncoding RNAs

    Get PDF
    Recent genome-wide computational screens that search for conservation of RNA secondary structure in whole-genome alignments (WGAs) have predicted thousands of structural noncoding RNAs (ncRNAs). The sensitivity of such approaches, however, is limited, due to their reliance on sequence-based whole-genome aligners, which regularly misalign structural ncRNAs. This suggests that many more structural ncRNAs may remain undetected. Structure-based alignment, which could increase the sensitivity, has been prohibitive for genome-wide screens due to its extreme computational costs. Breaking this barrier, we present the pipeline REAPR (RE-Alignment for Prediction of structural ncRNA), which efficiently realigns whole genomes based on RNA sequence and structure, thus allowing us to boost the performance of de novo ncRNA predictors, such as RNAz. Key to the pipeline's efficiency is the development of a novel banding technique for multiple RNA alignment. REAPR significantly outperforms the widely used predictors RNAz and EvoFold in genome-wide screens; in direct comparison to the most recent RNAz screen on D. melanogaster, REAPR predicts twice as many high-confidence ncRNA candidates. Moreover, modENCODE RNA-seq experiments confirm a substantial number of its predictions as transcripts. REAPR's advancement of de novo structural characterization of ncRNAs complements the identification of transcripts from rapidly accumulating RNA-seq data.National Institutes of Health (U.S.) (Grant RO1GM081871

    Computational analysis of noncoding RNAs

    Get PDF
    Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.Austrian Science Fund (Schrodinger Fellowship J2966-B12)German Research Foundation (grant WI 3628/1-1 to SW)National Institutes of Health (U.S.) (NIH award 1RC1CA147187

    NcDNAlign: Plausible multiple alignments of non-protein-coding genomic sequences

    Get PDF
    Genome-wide multiple sequence alignments (MSAs) are a necessary prerequisite for an increasingly diverse collection of comparative genomic approaches. Here we present a versatile method that generates high-quality MSAs for non-protein-coding sequences. The NcDNAlign pipeline combines pairwise BLAST alignments to create initial MSAs, which are then locally improved and trimmed. The program is optimized for speed and hence is particulary well-suited to pilot studies. We demonstrate the practical use of NcDNAlign in three case studies: the search for ncRNAs in gammaproteobacteria and the analysis of conserved noncoding DNA in nematodes and teleost fish, in the latter case focusing on the fate of duplicated ultra-conserved regions. Compared to the currently widely used genome-wide alignment program TBA, our program results in a 20- to 30-fold reduction of CPU time necessary to generate gammaproteobacterial alignments. A showcase application of bacterial ncRNA prediction based on alignments of both algorithms results in similar sensitivity, false discovery rates, and up to 100 putatively novel ncRNA structures. Similar findings hold for our application of NcDNAlign to the identification of ultra-conserved regions in nematodes and teleosts. Both approaches yield conserved sequences of unknown function, result in novel evolutionary insights into conservation patterns among these genomes, and manifest the benefits of an efficient and reliable genome-wide alignment package. The software is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/NcDNAlign/

    Discovering cancer-associated transcripts by RNA sequencing

    Full text link
    High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd

    Analysis of anopheline mosquito behavior and identification of vector control targets in the post-genomic era

    Get PDF
    Thesis advisor: Marc A.T. MuskavitchThe protozoan Plasmodium falciparum, the mosquito-borne pathogen that causes human malaria, remains one of the most difficult infectious parasites to combat and control. Campaigns against malaria eradication have succeeded, in most instances, at the level of vector control, rather than from initiatives that have attempted to decrease malaria burden by targeting parasites. The rapid evolution and spread of insecticide-resistant mosquitoes is threatening our ability to combat vectors and control malaria. Therefore, the development, procurement and distribution of new methods of vector control are paramount. Two aspects of vector biology that can be exploited toward these ends are vector behaviors and vector-specific insecticide targets. In this thesis, I describe three aspects of vector biology with potential for the development of improved means of vector control: photopreference behavior, long non-coding RNA (lncRNA) targets and epigenetic gene ensemble targets. My studies of photopreference have revealed that specific mosquito species within the genus Anopheles, An. gambiae and An. stephensi, exhibit different photopreference behaviors, and that each gender of mosquito in these species exhibits distinct light-dependent resting behaviors. These inter-specific behavioral differences may be affected by differing numbers of long-wavelength sensing Opsin genes in each species, and my findings regarding species-specific photopreferences suggest that some behavioral interventions may need to be tailored for specific vector mosquito species. Based on the advancement of next-generation sequencing technologies and the generation by others of assembled genomes of many anopheline mosquito species, I have identified a comprehensive set of approximately 3,000 lncRNAs and find that RNA secondary structures are notably conserved within the gambiae species complex. As lncRNAs and epigenetic modifiers cooperate to modulate epigenetic regulation, I have also analyzed the conservation of epigenetic gene ensembles across a number of anopheline species, based on identification of homologous epigenetic ensemble genes in An. gambiae compared to Drosophila melanogaster. Further analyses of these ensembles illustrate that these epigenetic genes are highly stable among many anopheline species, in that I detect only eight gene family expansion or contraction events among 169 epigenetic ensemble genes within a set of 12 anopheline species. My hope is that my findings will enable deeper investigations of many behavioral and epigenetic processes in Anopheles gambiae and other anopheline vector mosquitoes and thereby enable the development of new, more effective means of vector and malaria control.Thesis (PhD) — Boston College, 2015.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology
    • …
    corecore