261 research outputs found

    SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints

    Get PDF
    Splicing event identification is one of the most important issues in the comprehensive analysis of transcription profile. Recent development of next-generation sequencing technology has generated an extensive profile of alternative splicing. However, while many of these splicing events are between exons that are relatively close on genome sequences, reads generated by RNA-Seq are not limited to alternative splicing between close exons but occur in virtually all splicing events. In this work, a novel method, SAW, was proposed for the identification of all splicing events based on short reads from RNA-Seq. It was observed that short reads not in known gene models are actually absent words from known gene sequences. An efficient method to filter and cluster these short reads by fingerprint fragments of splicing events without aligning short reads to genome sequences was developed. Additionally, the possible splicing sites were also determined without alignment against genome sequences. A consensus sequence was then generated for each short read cluster, which was then aligned to the genome sequences. Results demonstrated that this method could identify more than 90% of the known splicing events with a very low false discovery rate, as well as accurately identify, a number of novel splicing events between distant exons

    Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

    Full text link
    Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    Towards the reconstruction of integrated genome-scale models of metabolism and gene expression

    Get PDF
    The reconstruction of integrated genome-scale models of metabolism and gene expression has been a challenge for a while now. In fact, various methods that allow integrating reconstructions of Transcriptional Regulatory Networks, gene expression data or both into Genome-Scale Metabolic Models have been proposed. Several of these methods are surveyed in this article, which allowed identifying their strengths and weaknesses concerning the reconstruction of integrated models for multiple prokaryotic organisms. Additionally, the main resources of regulatory information were also surveyed, as the existence of novel sources of regulatory information and gene expression data may contribute for the improvement of methodologies referred herein.This study was supported by the Portuguese Foundation for Science andTechnology (FCT) under the scope of the strategic funding of UID/BIO/04469/2019 unit andBioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European RegionalDevelopment Fund under the scope of Norte2020-Programa Operacional Regional do Norte. Fernando Cruz holds a doctoral fellowship (SFRH/BD/139198/2018) funded by the FCT. The authors thank project SHIKIFACTORY100 - Modular cell factories for the production of 100 compounds from the shikimate pathway (814408) funded by the European Commission.info:eu-repo/semantics/publishedVersio

    Protocol Dependence of Sequencing-Based Gene Expression Measurements

    Get PDF
    RNA Seq provides unparalleled levels of information about the transcriptome including precise expression levels over a wide dynamic range. It is essential to understand how technical variation impacts the quality and interpretability of results, how potential errors could be introduced by the protocol, how the source of RNA affects transcript detection, and how all of these variations can impact the conclusions drawn. Multiple human RNA samples were used to assess RNA fragmentation, RNA fractionation, cDNA synthesis, and single versus multiple tag counting. Though protocols employing polyA RNA selection generate the highest number of non-ribosomal reads and the most precise measurements for coding transcripts, such protocols were found to detect only a fraction of the non-ribosomal RNA in human cells. PolyA RNA excludes thousands of annotated and even more unannotated transcripts, resulting in an incomplete view of the transcriptome. Ribosomal-depleted RNA provides a more cost-effective method for generating complete transcriptome coverage. Expression measurements using single tag counting provided advantages for assessing gene expression and for detecting short RNAs relative to multi-read protocols. Detection of short RNAs was also hampered by RNA fragmentation. Thus, this work will help researchers choose from among a range of options when analyzing gene expression, each with its own advantages and disadvantages

    Genomic sequencing in clinical trials

    Get PDF
    Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed

    Second-Generation Sequencing Supply an Effective Way to Screen RNAi Targets in Large Scale for Potential Application in Pest Insect Control

    Get PDF
    The key of RNAi approach success for potential insect pest control is mainly dependent on careful target selection and a convenient delivery system. We adopted second-generation sequencing technology to screen RNAi targets. Illumina's RNA-seq and digital gene expression tag profile (DGE-tag) technologies were used to screen optimal RNAi targets from Ostrinia furnalalis. Total 14690 stage specific genes were obtained which can be considered as potential targets, and 47 were confirmed by qRT-PCR. Ten larval stage specific expression genes were selected for RNAi test. When 50 ng/µl dsRNAs of the genes DS10 and DS28 were directly sprayed on the newly hatched larvae which placed on the filter paper, the larval mortalities were around 40∼50%, while the dsRNAs of ten genes were sprayed on the larvae along with artificial diet, the mortalities reached 73% to 100% at 5 d after treatment. The qRT-PCR analysis verified the correlation between larval mortality and the down-regulation of the target gene expression. Topically applied fluorescent dsRNA confirmed that dsRNA did penetrate the body wall and circulate in the body cavity. It seems likely that the combination of DGE-tag with RNA-seq is a rapid, high-throughput, cost less and an easy way to select the candidate target genes for RNAi. More importantly, it demonstrated that dsRNAs are able to penetrate the integument and cause larval developmental stunt and/or death in a lepidopteron insect. This finding largely broadens the target selection for RNAi from just gut-specific genes to the targets in whole insects and may lead to new strategies for designing RNAi-based technology against insect damage

    Transcriptomic landscape of breast cancers through mRNA sequencing

    Get PDF
    Breast cancer is a heterogeneous disease with a poorly defined genetic landscape, which poses a major challenge in diagnosis and treatment. By massively parallel mRNA sequencing, we obtained 1.2 billion reads from 17 individual human tissues belonging to TNBC, Non-TNBC, and HER2-positive breast cancers and defined their comprehensive digital transcriptome for the first time. Surprisingly, we identified a high number of novel and unannotated transcripts, revealing the global breast cancer transcriptomic adaptations. Comparative transcriptomic analyses elucidated differentially expressed transcripts between the three breast cancer groups, identifying several new modulators of breast cancer. Our study also identified common transcriptional regulatory elements, such as highly abundant primary transcripts, including osteonectin, RACK1, calnexin, calreticulin, FTL, and B2M, and “genomic hotspots” enriched in primary transcripts between the three groups. Thus, our study opens previously unexplored niches that could enable a better understanding of the disease and the development of potential intervention strategies
    corecore