289,365 research outputs found

    Optimization Techniques For Next-Generation Sequencing Data Analysis

    Get PDF
    High-throughput RNA sequencing (RNA-Seq) is a popular cost-efficient technology with many medical and biological applications. This technology, however, presents a number of computational challenges in reconstructing full-length transcripts and accurately estimate their abundances across all cell types. Our contributions include (1) transcript and gene expression level estimation methods, (2) methods for genome-guided and annotation-guided transcriptome reconstruction, and (3) de novo assembly and annotation of real data sets. Transcript expression level estimation, also referred to as transcriptome quantification, tackle the problem of estimating the expression level of each transcript. Transcriptome quantification analysis is crucial to determine similar transcripts or unraveling gene functions and transcription regulation mechanisms. We propose a novel simulated regression based method for transcriptome frequency estimation from RNA-Seq reads. Transcriptome reconstruction refers to the problem of reconstructing the transcript sequences from the RNA-Seq data. We present genome-guided and annotation-guided transcriptome reconstruction methods. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to currently state of the art methods. We further present the assembly and annotation of Bugula neritina transcriptome (a marine colonial animal), and Tallapoosa darter genome (a species-rich radiation freshwater fish)

    Cloud Computing for Next-Generation Sequencing Data Analysis

    Get PDF
    High-throughput next-generation sequencing (NGS) technologies have evolved rapidly and are reshaping the scope of genomics research. The substantial decrease in the cost of NGS techniques in the past decade has led to its rapid adoption in biological research and drug development. Genomics studies of large populations are producing a huge amount of data, giving rise to computational issues around the storage, transfer, and analysis of the data. Fortunately, cloud computing has recently emerged as a viable option to quickly and easily acquire the computational resources for large-scale NGS data analyses. Some cloud-based applications and resources have been developed specifically to address the computational challenges of working with very large volumes of data generated by NGS technology. In this chapter, we will review some cloud-based systems and solutions for NGS data analysis, discuss the practical hurdles and limitations in cloud computing, including data transfer and security, and share the lessons we learned from the implementation of Rainbow, a cloud-based tool for large-scale genome sequencing data analysis

    Applications and data analysis of next-generation sequencing

    Get PDF
    Over the past 6 years, next-generation sequencing (NGS) has been established as a valuable high-throughput method for research in molecular genetics and has successfully been employed in the identification of rare and common genetic variations. Although the high expectations regarding the discovery of new diagnostic targets and an overall reduction of cost have been achieved, technological challenges in instrument handling, robustness of the chemistry, and data analysis need to be overcome. Each workflow and sequencing platform have their particular problems and caveats, which need to be addressed. Regarding NGS, there is a variety of different enrichment methods, sequencing devices, or technologies as well as a multitude of analyzing software products available. In this manuscript, the authors focus on challenges in data analysis when employing different target enrichment methods and the best applications for each of the

    BATCH-GE : batch analysis of next-generation sequencing data for genome editing assessment

    Get PDF
    Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome

    Single cell transcriptome analysis using next generation sequencing.

    Get PDF
    The heterogeneity of tissues, especially in cancer research, is a central issue in transcriptome analysis. In recent years, research has primarily focused on the development of methods for single cell analysis. Single cell analysis aims at gaining (novel) insights into biological processes of healthy and diseased cells. Some of the challenges in transcriptome analysis concern low abundance of sample starting material, necessary sample amplification steps and subsequent analysis. In this study, two fundamentally different approaches to amplification were compared using next-generation sequencing analysis: I. exponential amplification using polymerase-chain-reaction (PCR) and II. linear amplification. For both approaches, protocols for single cell extraction, cell lysis, cDNA synthesis, cDNA amplification and preparation of next-generation sequencing libraries were developed. We could successfully show that transcriptome analysis of low numbers of cells is feasible with both exponential and linear amplification. Using exponential amplification, the highest amplification rates up to 106 were possible. The reproducibility of results is a strength of the linear amplification method. The analysis of next generation sequencing data in single cell samples showed detectable expression in at least 16.000 genes. The variance between samples results in a need to work with a greater amount of biological replicates. In summary it can be said that single cell transcriptome analysis with next generation sequencing is possible but improvements leading to a higher yield of transcriptome reads is required. In the near future by comparing single cancer cells with healthy ones for example, a basis for improved prognosis and diagnosis can be realised

    Algorithms for analysis of next-generation viral sequencing data

    Get PDF
    RNA viruses mutate at extremely high rates, forming an intra-host viral population of closely related variants, which allows them to evade the host’s immune system and makes them particularly dangerous. Viral outbreaks pose a significant threat for public health. Progress of sequencing technologies made it possible to identify and sample intra-host viral populations at great depth. Consequently, the contribution of sequencing technologies to molecular surveillance of viral outbreaks becomes more and more substantial. Genome sequencing of viral populations reveals similarities between samples, allows to measure viral genetic distance and facilitate outbreak identification and isolation. Computational methods can be used to infer transmission characteristics from sequencing data. However, due to the specifics of next-generation sequencing (NGS) approaches, and the limited availability of viral data, existing methods lack accuracy and efficiency. In this dissertation, I present a novel, flexible methods, that allow tackling crucial epidemiological problems, such as identification of transmission clusters, sources of infection, and transmission direction

    Streamlined Mutation Analysis for Clinical Next Generation Sequencing Data

    Get PDF
    The Laboratory of Molecular Oncology at UMass Medical Center recently implemented a clinical Next Generation Sequencing assay, but lacked an adequate bioinformatics solution for analyzing the data. To streamline the analysis process, a program was developed in Excel Visual Basic that filters raw data and compiles a medical report from the mutational findings. Testing was conducted using data from a variety of different tumor profiles to ensure technical accuracy. Feedback was also collected from the laboratory staff in regards to the program’s usability, and adequate adjustments were made to the program in response. Ultimately, this solution shortens the turnaround time for clinical specimens, reduces the likelihood of errors, and improves patient care for the hospital

    Next-generation sequencing of vertebrate experimental organisms

    Get PDF
    Next-generation sequencing technologies are revolutionizing biology by allowing for genome-wide transcription factor binding-site profiling, transcriptome sequencing, and more recently, whole-genome resequencing. While it is currently not possible to generate complete de novo assemblies of higher-vertebrate genomes using next-generation sequencing, improvements in sequence read lengths and throughput, coupled with new assembly algorithms for large data sets, will soon make this a reality. These developments will in turn spawn a revolution in how genomic data are used to understand genetics and how model organisms are used for disease gene discovery. This review provides an overview of the current next-generation sequencing platforms and the newest computational tools for the analysis of next-generation sequencing data. We also describe how next-generation sequencing may be applied in the context of vertebrate model organism genetics

    ParMap, an Algorithm for the Identification of Complex Genomic Variations in Nextgen Sequencing Data

    Get PDF
    Next-generation sequencing produces high-throughput data, albeit with greater error and shorter reads than traditional Sanger sequencing methods. This complicates the detection of genomic variations, especially, small insertions and deletions. Here we describe ParMap, a statistical algorithm for the identification of complex genetic variants using partially mapped reads in nextgen sequencing data. We also report ParMap’s successful application to the mutation analysis of chromosome X exome-captured leukemia DNA samples
    corecore