8 research outputs found

    Resolving complex structural variants via nanopore sequencing

    Get PDF
    The recent development of high-throughput sequencing platforms provided impressive insights into the field of human genetics and contributed to considering structural variants (SVs) as the hallmark of genome instability, leading to the establishment of several pathologic conditions, including neoplasia and neurodegenerative and cognitive disorders. While SV detection is addressed by next-generation sequencing (NGS) technologies, the introduction of more recent long-read sequencing technologies have already been proven to be invaluable in overcoming the inaccuracy and limitations of NGS technologies when applied to resolve wide and structurally complex SVs due to the short length (100–500 bp) of the sequencing read utilized. Among the long-read sequencing technologies, Oxford Nanopore Technologies developed a sequencing platform based on a protein nanopore that allows the sequencing of “native” long DNA molecules of virtually unlimited length (typical range 1–100 Kb). In this review, we focus on the bioinformatics methods that improve the identification and genotyping of known and novel SVs to investigate human pathological conditions, discussing the possibility of introducing nanopore sequencing technology into routine diagnostics

    Detecting inosine in nanopore sequencing data with machine learning

    Get PDF
    Detecting modifications in DNA has been a long-standing challenge in understanding the workings of the genome, particularly with regards to regulatory function. The currently most widely used sequencing technology, NGS, offers protocols to tackle these challenges but these are modification specific and involve convoluting preparation steps. As an alternative, nanopore sequencing offers the direct observation of such modifications. Though inosine has been demonstrated to be distinguishable from adenine in poly(A) RNA using nanopore sequencing, no framework has been proposed for the general detection of inosine presence in nanopore sequence data. In this thesis, I propose a test-based approach to use out-of-the-box classifiers to distinguish between sequences containing inosine and sequences that don’t based on features present in nanopore sequencing data. The proposed model achieves a high accuracy on this classification task, providing avenues for further development of a self-contained inosine detector, as well as further exploration of the same approach to other modifications.Masteroppgave i informatikkINF399MAMN-INFMAMN-PRO

    Quality and clinical utility of genomic variants in complex diseases

    Get PDF
    Continuous improvements in high-throughput genomic sequencing over the past two decades have made it exponentially faster and cheaper, enabling its routine use in the clinic and scientific research. Genomic prognostic tools make use of personalised genomic data to aid clinical decision making and inform patients of disease outcomes, allowing enhanced tailoring of treatment beyond traditional prognostic tools, which are insufficient for understanding the nuances of individual complex disease cases. This relies upon accurate sequencing data and effective quality control. We have developed improved genomic prognostic tools for use in the clinic and demonstrate a novel method for quality control of genomic sequencing data with broad applicability. Non-small-cell lung cancer (NSCLC) is the second most common cancer type in both males and females globally. Previous attempts to predict survival time for cancer patients have used genomic prognostic tools based on the burden of tumour mutations and neoantigens, but with limited success. We developed greatly improved classifiers of tumour mutation and neoantigenic burden showing strong 5-year survival differences between early-stage NSCLC patients. By using these together, we showed additional increases in prognostic efficacy, with the best survival group displaying a ~92% decreased risk of death in a 5-year period compared to the worst survival group. To improve the accuracy of sequencing data for uses such as this, we developed the first tool for automatically cataloguing systematic sequencing biases for a sequencing pipeline, and we demonstrated its value in human and SARS-CoV-2 sequencing quality control with Illumina and Oxford Nanopore sequencing. We discovered and blacklisted a range of false positive variants, and investigated the causes of these. Identifying these errors contributed to multiple studies, altering research conclusions. We share these tools to provide continued improvements to genomic prognostics and sequencing accuracy affecting a wide range of fields

    Computational Analysis of the Transcriptome Using Long-Read RNA Sequencing

    Get PDF
    Reconstructing the transcriptome from RNA sequencing reads is a challenging problem, especially when no high quality reference genome is available. Current transcriptome annotations have largely relied on short read lengths intrinsic to most widely used high-throughput cDNA sequencing technologies. For example, in the annotation of the Caenorhabditis elegans transcriptome, more than half of the transcript isoforms lack full-length support and instead rely on inference from short reads that do not span the full length of the isoform. Short read sequencing technologies, though accurate, cannot reliably reconstruct full-length transcripts due to the highly complex nature of the transcriptome with large gene families, widespread alternative splicing, and highly variable expression and coverage per transcript. We applied nanopore-based direct RNA sequencing to characterize the developmental polyadenylated transcriptome of C. elegans. Using this approach we provide support for 23,865 splice isoforms across 14,611 genes, without the need for computational reconstruction of gene models. In addition, we have developed an open source de novo transcriptome assembly method, CONDUIT, which uses single molecule long read RNA sequencing to generate scaffolded splice graphs independent of a reference genome. It then pseudomaps short-read RNA sequencing reads to isoforms extracted from the scaffolded splice graph, polishes these splice graphs using both short and long read data, and outputs consensus isoforms extracted from these splice graphs. We show that CONDUIT produces highly accurate consensus isoforms, completely independent of a reference genome in several model systems and in a novel pathogenic yeast system

    The Host-Microbiota Axis in Chronic Wound Healing

    Get PDF
    Chronic, non-healing skin wounds represent a substantial area of unmet clinical need, leading to debilitating morbidity and mortality in affected individuals. Due to their high prevalence and recurrence, chronic wounds pose a significant economic burden. Wound infection is a major component of healing pathology, with up to 70% of wound-associated lower limb amputations preceded by infection. Despite this, the wound microbiome remains poorly understood. Studies outlined in this thesis aimed to characterise the wound microbiome and explore the complex interactions that occur in the wound environment. Wound samples were analysed using a novel long-read nanopore sequencing-based approach that delivers quantitative species-level taxonomic identification. Clinical wound specimens were collected at both the point of lower-extremity amputation and via a pilot clinical trial evaluating extracorporeal shockwave therapy (ESWT) for wound healing. Combining microbial community composition, host tissue transcriptional (RNAseq) profiling, with clinical parameters has provided new insight into healing pathology. Specific commensal and pathogenic organisms appear mechanistically linked to healing, eliciting unique host response signatures. Patient- and site-specific shifts in microbial abundance and communitycomposition were observed in individuals with chronic wounds versus healthy skin. Transcriptional profiling (RNAseq) of the wound tissue revealed important insight into functional elements of the host-microbe interaction. Finally, ESWT was shown to confer beneficial effects on both cellular and microbial aspects of healing. High-resolution long-read sequencing offers clinically important genomic insights, including rapid wide-spectrum pathogen identification and antimicrobial resistance profiling, which are not possible using current culture-based diagnostic approaches. Thus, data presented in this thesis provides important new insight into complex host-microbe interactions within the wound microbiome, providing new and exciting future avenues for diagnostic and therapeutic approaches to wound management
    corecore