19 research outputs found

    Novel Algorithm Development for ‘NextGeneration’ Sequencing Data Analysis

    Get PDF
    In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applications for interrogating whole genomes and transcriptomes in research, diagnostic and forensic settings. While the innovations in sequencing have been explosive, the development of scalable and robust bioinformatics software and algorithms for the analysis of new types of data generated by these technologies have struggled to keep up. As a result, large volumes of NGS data available in public repositories are severely underutilised, despite providing a rich resource for data mining applications. Indeed, the bottleneck in genome and transcriptome sequencing experiments has shifted from data generation to bioinformatics analysis and interpretation. This thesis focuses on development of novel bioinformatics software to bridge the gap between data availability and interpretation. The work is split between two core topics – computational prioritisation/identification of disease gene variants and identification of RNA N6 -adenosine Methylation from sequencing data. The first chapter briefly discusses the emergence and establishment of NGS technology as a core tool in biology and its current applications and perspectives. Chapter 2 introduces the problem of variant prioritisation in the context of Mendelian disease, where tens of thousands of potential candidates are generated by a typical sequencing experiment. Novel software developed for candidate gene prioritisation is described that utilises data mining of tissue-specific gene expression profiles (Chapter 3). The second part of chapter investigates an alternative approach to candidate variant prioritisation by leveraging functional and phenotypic descriptions of genes and diseases from multiple biomedical domain ontologies (Chapter 4). Chapter 5 discusses N6 AdenosineMethylation, a recently re-discovered posttranscriptional modification of RNA. The core of the chapter describes novel software developed for transcriptome-wide detection of this epitranscriptomic mark from sequencing data. Chapter 6 presents a case study application of the software, reporting the previously uncharacterised RNA methylome of Kaposi’s Sarcoma Herpes Virus. The chapter further discusses a putative novel N6-methyl-adenosine -RNA binding protein and its possible roles in the progression of viral infection

    Single cell spatial analysis reveals inflammatory foci of immature neutrophil and CD8 T cells in COVID-19 lungs

    Get PDF
    Single cell spatial interrogation of the immune-structural interactions in COVID −19 lungs is challenging, mainly because of the marked cellular infiltrate and architecturally distorted microstructure. To address this, we develop a suite of mathematical tools to search for statistically significant co-locations amongst immune and structural cells identified using 37-plex imaging mass cytometry. This unbiased method reveals a cellular map interleaved with an inflammatory network of immature neutrophils, cytotoxic CD8 T cells, megakaryocytes and monocytes co-located with regenerating alveolar progenitors and endothelium. Of note, a highly active cluster of immature neutrophils and CD8 T cells, is found spatially linked with alveolar progenitor cells, and temporally with the diffuse alveolar damage stage. These findings offer further insights into how immune cells interact in the lungs of severe COVID-19 disease. We provide our pipeline [Spatial Omics Oxford Pipeline (SpOOx)] and visual-analytical tool, Multi-Dimensional Viewer (MDV) software, as a resource for spatial analysis

    Characterization and Genomic Localization of a SMAD4 Processed Pseudogene

    Get PDF
    Like many clinical diagnostic laboratories, we undertake routine investigation of cancer-predisposed individuals by high-throughput sequencing of patient DNA that has been target-enriched for genes associated with hereditary cancer. Accurate diagnosis using such reagents requires alertness against rare nonpathogenic variants that may interfere with variant calling. In a cohort of 2042 such cases, we identified five that initially appeared to be carriers of a 95-bp deletion of SMAD4 intron 6. More detailed analysis indicated that these individuals all carried one copy of a SMAD4 processed gene. Because of its interference with diagnostic analysis, we characterized this processed gene in detail. Whole genome sequencing and confirmatory Sanger sequencing of junction PCR products were used to show that in each of the five cases, the SMAD4 processed gene was integrated at the same position on chromosome 9, located within the last intron of the SCAI gene. This rare polymorphic processed gene therefore reflects the occurrence of a single ancestral retrotransposition event. Compared to the reference SMAD4 mRNA sequence NM_005359.5 (https://www.ncbi.nlm.nih.gov/nucleotide/), the 5′ and 3′ UTR regions of the processed gene are both truncated, but its open reading frame is unaltered. Our experience leads us to advocate the use of an RNA-seq aligner, as part of diagnostic assay quality assurance, since this allows their recognition in a comparatively facile automated fashion

    Enhanced diagnostic yield in Meckel-Gruber and Joubert syndrome through exome sequencing supplemented with split-read mapping

    Get PDF
    Background The widespread adoption of high-throughput sequencing technologies by genetic diagnostic laboratories has enabled significant expansion of their testing portfolios. Rare autosomal recessive conditions have been a particular focus of many new services. Here we report a cohort of 26 patients referred for genetic analysis of Joubert (JBTS) and Meckel-Gruber (MKS) syndromes, two clinically and genetically heterogeneous neurodevelopmental conditions that define a phenotypic spectrum, with MKS at the severe end. Methods Exome sequencing was performed for all cases, using Agilent SureSelect v5 reagents and Illumina paired-end sequencing. For two cases medium-coverage (9×) whole genome sequencing was subsequently undertaken. Results Using a standard analysis pipeline for the detection of single nucleotide and small insertion or deletion variants, molecular diagnoses were confirmed in 12 cases (4 %). Seeking to determine whether our cohort harboured pathogenic copy number variants (CNV), in JBTS- or MKS-associated genes, targeted comparative read-depth analysis was performed using FishingCNV. These analyses identified a putative intragenic AHI1 deletion that included three exons spanning at least 3.4 kb and an intergenic MPP4 to TMEM237 deletion that included exons spanning at least 21.5 kb. Whole genome sequencing enabled confirmation of the deletion-containing alleles and precise characterisation of the mutation breakpoints at nucleotide resolution. These data were validated following development of PCR-based assays that could be subsequently used for “cascade” screening and/or prenatal diagnosis. Conclusions Our investigations expand the AHI1 and TMEM237 mutation spectrum and highlight the importance of performing CNV screening of disease-associated genes. We demonstrate a robust increasingly cost-effective CNV detection workflow that is applicable to all MKS/JBTS referrals

    A robust deep learning workflow to predict CD8 + T-cell epitopes

    No full text
    Abstract Background T-cells play a crucial role in the adaptive immune system by triggering responses against cancer cells and pathogens, while maintaining tolerance against self-antigens, which has sparked interest in the development of various T-cell-focused immunotherapies. However, the identification of antigens recognised by T-cells is low-throughput and laborious. To overcome some of these limitations, computational methods for predicting CD8 + T-cell epitopes have emerged. Despite recent developments, most immunogenicity algorithms struggle to learn features of peptide immunogenicity from small datasets, suffer from HLA bias and are unable to reliably predict pathology-specific CD8 + T-cell epitopes. Methods We developed TRAP (T-cell recognition potential of HLA-I presented peptides), a robust deep learning workflow for predicting CD8 + T-cell epitopes from MHC-I presented pathogenic and self-peptides. TRAP uses transfer learning, deep learning architecture and MHC binding information to make context-specific predictions of CD8 + T-cell epitopes. TRAP also detects low-confidence predictions for peptides that differ significantly from those in the training datasets to abstain from making incorrect predictions. To estimate the immunogenicity of pathogenic peptides with low-confidence predictions, we further developed a novel metric, RSAT (relative similarity to autoantigens and tumour-associated antigens), as a complementary to ‘dissimilarity to self’ from cancer studies. Results TRAP was used to identify epitopes from glioblastoma patients as well as SARS-CoV-2 peptides, and it outperformed other algorithms in both cancer and pathogenic settings. TRAP was especially effective at extracting immunogenicity-associated properties from restricted data of emerging pathogens and translating them onto related species, as well as minimising the loss of likely epitopes in imbalanced datasets. We also demonstrated that the novel metric termed RSAT was able to estimate immunogenic of pathogenic peptides of various lengths and species. TRAP implementation is available at: https://github.com/ChloeHJ/TRAP . Conclusions This study presents a novel computational workflow for accurately predicting CD8 + T-cell epitopes to foster a better understanding of antigen-specific T-cell response and the development of effective clinical therapeutics

    Chromosome 7 ideogram and breakpoint confirmation.

    No full text
    <p><b>(A)</b> Arrows showing the breakpoint locations. Greek letters facilitate interpretation of the resulting pericentric inversion. Sanger sequencing results for the normal and breakpoint spanning amplicons for <b>(B)</b> the 7p15 and <b>(C)</b> the 7q21 inversion boundaries. The vertical dashed read line highlights the breakpoint. For ease of comparison a dashed black line has been drawn onto the normal sequence. (+): sense strand sequence; (-): antisense strand sequence. The inversion has resulted in an AT dinucleotide duplication which is shown arbitrarily assigned to the 7p15 breakpoint.</p

    Additional file 2: Figure S1. of Enhanced diagnostic yield in Meckel-Gruber and Joubert syndrome through exome sequencing supplemented with split-read mapping

    No full text
    Schematic representation of the FishingCNV-defined deletions showing the minimum (red) and maximum (green) possible boundaries of the deletion breakpoints for (A) the intragenic AHI1 deletion and (B) the TMEM237 to MPP4 deletion. Exons are displayed in blue and numbering is in accordance with transcripts [GenBank:NM_001134830.1] (AHI1), [GenBank:NM_001044385.2] and [GenBank:NM_152388.3] (TMEM237) and [GenBank:NM_033066.2] (MPP4). (TIF 19237 kb
    corecore