10 research outputs found
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine
Investigating cancer heterogeneity through single-cell DNA sequencing
Intratumor heterogeneity (ITH) describes the coexistence of cellular populations with distinct geno- and phenotypes within a tumor, posing a major obstacle to successful cancer treatment. The rapid progress in sequencing technologies over the last decades enabled studying ITH at the single-cell level, the highest possible resolution. Single-cell DNA sequencing (scDNA-seq) accesses the genomic information of individual tumor cells and their joint evolutionary history. This thesis presents three studies investigating genomic ITH through scDNA-seq, preceded by an introduction and concluded with a summary.
Chapter 1 opens with the development of sequencing technologies, particularly DNA and single-cell sequencing, and provides an overview of cancer evolution and ITH.
Chapter 2 presents demoTape, a computational demultiplexing method for targeted scDNA-seq data, leveraging the genomic distance between cells of jointly sequenced patients to separate them. On simulated data, demoTape outperforms competing methods in demultiplexing accuracy. Applied to a sample of three multiplexed lymphoma patients, it successfully demultiplexes the cells, leading to similar downstream analysis results as individually sequenced patients. DemoTape, therefore, allows the joint preparation and sequencing of multiple samples, saving costs and labor.
Chapter 3 describes BnpC, a Bayesian non-parametric clustering method to identify cellular populations and their genotypes from scDNA-seq data. On simulated data, BnpC surpasses competing methods in accuracy and scalability. Applied to published scDNA-seq data, BnpC reproduces results that previously required additional experimental data or manual curation. The ability of BnpC to identify cellular populations and their genotypes holds great potential for personalized cancer therapies.
Chapter 4 introduces the Poisson Tree test for detecting variable evolutionary rates among cell lineages, leveraging the phylogenetic information inherent to scDNA-seq data. When applied to 24 scDNA-seq datasets derived from different cancer types and healthy tissue, the Poisson Tree test rejects a constant rate in over 70% of cancer and in over 50% of healthy tissue datasets, suggesting that variations in the evolutionary rate are predominant in cancer but also frequently occur in healthy tissue.
This thesis concludes with Chapter 5, discussing the presented studies in a greater context, reflecting on their limitations, and suggesting directions for future research
Single-cell phylogenies reveal changes in the evolutionary rate within cancer and healthy tissues
Cell lineages accumulate somatic mutations during organismal development, potentially leading to pathological states. The rate of somatic evolution within a cell population can vary due to multiple factors, including selection, a change in the mutation rate, or differences in the microenvironment. Here, we developed a statistical test called the Poisson Tree (PT) test to detect varying evolutionary rates among cell lineages, leveraging the phylogenetic signal of single-cell DNA sequencing (scDNA-seq) data. We applied the PT test to 24 healthy and cancer samples, rejecting a constant evolutionary rate in 11 out of 15 cancer and five out of nine healthy scDNA-seq datasets. In six cancer datasets, we identified subclonal mutations in known driver genes that could explain the rate accelerations of particular cancer lineages. Our findings demonstrate the efficacy of scDNA-seq for studying somatic evolution and suggest that cell lineages often evolve at different rates within cancer and healthy tissues.ISSN:2666-979
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing for personalized oncology
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Long-read single-cell RNA sequencing (scRNA-seq), capturing full-length transcripts, lacked the depth to provide this information so far. Here, we increased the PacBio sequencing depth to 12,000 reads per cell, leveraging multiple strategies, including artifact removal and transcript concatenation, and applied the technology to samples from three human ovarian cancer patients. Our approach captured 152,000 isoforms, of which over 52,000 were novel, detected cell type- and cell-specific isoform usage, and revealed differential isoform expression in tumor and mesothelial cells. Furthermore, we identified gene fusions, including a novel scDNA sequencing-validated IGF2BP2::TESPA1 fusion, which was misclassified as high TESPA1 expression in matched short-read data, and called somatic and germline mutations, confirming targeted NGS cancer gene panel results. With multiple new opportunities, especially for cancer biology, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer
Abstract Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine
SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data
We present SIEVE, a statistical method for the joint inference of somatic variants and cell phylogeny under the finite-sites assumption from single-cell DNA sequencing. SIEVE leverages raw read counts for all nucleotides and corrects the acquisition bias of branch lengths. In our simulations, SIEVE outperforms other methods in phylogenetic reconstruction and variant calling accuracy, especially in the inference of homozygous variants. Applying SIEVE to three datasets, one for triple-negative breast (TNBC), and two for colorectal cancer (CRC), we find that double mutant genotypes are rare in CRC but unexpectedly frequent in the TNBC samples
Machine learning-based classification to improve Gas Chromatography-Mass spectrometry data processing.
Methodological & Technological developmentsIntroductionLack of reliable peak detection impedes automated analysis of large-scale gas chromatography-mass spectrometry (GCMS) metabolomics datasets. Performance and outcome of individual peak-picking algorithms can differ widely depending on both algorithmic approach and parameters, as well as data acquisition method. Therefore, comparing and contrasting between algorithms is difficult.Technological and methodological innovationWe present part of the work published in [1] and implemented in our workflow for improved peak picking (WiPP),focusing on the use of machine learning-based classification to optimize and improve different steps of the common GC-MS metabolomics data processing workflow. Our approach evaluates the quality of detected peaks using a machine learning based classification scheme based on seven peak classes. The quality information returned by the classifier for each individual peak is merged with results from different peak detection algorithms to create one final high-quality peak set for immediate down-stream analysis.Results and impactWe benchmarked our workflow to standard compound mixes and a complex biological dataset, demonstrating that peak detection is improved. Furthermore, the approach can provide an impartial performance comparison of different peak picking algorithms. We also discuss the applicability of the approach to liquid chromatography-mass spectrometry data.References[1] Gloaguen, Y.; Borgsmüller, N. et al. WiPP: Workflow for Improved Peak Picking for Gas Chromatography-MassSpectrometry (GC-MS) Data. Metabolites 2019, 9, 171
V-pipe 3.0: a sustainable pipeline for within-sample viral genetic diversity estimation
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science
Within-patient genetic diversity of SARS-CoV-2
SARS-CoV-2, the virus responsible for the current COVID-19 pandemic, is evolving into different genetic variants by accumulating mutations as it spreads globally. In addition to this diversity of consensus genomes across patients, RNA viruses can also display genetic diversity within individual hosts, and co-existing viral variants may affect disease progression and the success of medical interventions. To systematically examine the intra-patient genetic diversity of SARS-CoV-2, we processed a large cohort of 3939 publicly-available deeply sequenced genomes with specialised bioinformatics software, along with 749 recently sequenced samples from Switzerland. We found that the distribution of diversity across patients and across genomic loci is very unbalanced with a minority of hosts and positions accounting for much of the diversity. For example, the D614G variant in the Spike gene, which is present in the consensus sequences of 67.4% of patients, is also highly diverse within hosts, with 29.7% of the public cohort being affected by this coexistence and exhibiting different variants. We also investigated the impact of several technical and epidemiological parameters on genetic heterogeneity and found that age, which is known to be correlated with poor disease outcomes, is a significant predictor of viral genetic diversity
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine.ISSN:2041-172