346 research outputs found

    Methods and practice of detecting selection in human cancers

    Get PDF
    Cancer development and progression is an evolutionary process, understanding these evolutionary dynamics is important for treatment and diagnosis as how a cancer evolves determines its future prognosis. This thesis focuses on elucidating selective evolutionary pressures in cancers and somatic tissues using population genetics models and cancer genomics data. First a model for the expected diversity in the absence of selection was developed. This neutral model of evolution predicts that under neutrality the frequency of subclonal mutations is expected to follow a power law distribution. Surprisingly more than 30% of cancer across multiple cohorts fitted this model. The next part of the thesis develops models to explore the effects of selection given these should be observable as deviations from the neutral prediction. For this I developed two approaches. The first approach investigated selection at the level of individual samples and showed that a characteristic pattern of clusters of mutations is observed in deep sequencing experiments. Using a mathematical model, information encoded within these clusters can be used to measure the relative fitness of subclones and the time they emerge during tumour evolution. With this I observed strikingly high fitness advantages for subclones of above 20%. The second approach enables measuring recurrent patterns of selection in cohorts of sequenced cancers using dN/dS, the ratio of non-synonymous to synonymous mutations, a method originally developed for molecular species evolution. This approach demonstrates how selection coefficients can be extracted by combining measurements of dN/dS with the size of mutational lineages. With this approach selection coefficients were again observed to be strikingly high. Finally I looked at population dynamics in normal colonic tissue given that many mutations accumulate in physiologically normal tissue. I found that the current view of stem cell dynamics was unable to explain sequencing data from individual colonic crypts. Some new models were proposed that introduce a longer time scale evolution that suppresses the accumulation of mutations which appear consistent with the data

    Statistical Methods For Genomic And Transcriptomic Sequencing

    Get PDF
    Part 1: High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but CNV profiling from whole-exome sequencing (WES) is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for WES data. CODEX includes a Poisson latent factor model, which includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based segmentation procedure that explicitly models the count-based WES data. CODEX is compared to existing methods on germline CNV detection in HapMap samples using microarray-based gold standard and is further evaluated on 222 neuroblastoma samples with matched normal, with focus on somatic CNVs within the ATRX gene. Part 2: Cancer is a disease driven by evolutionary selection on somatic genetic and epigenetic alterations. We propose Canopy, a method for inferring the evolutionary phylogeny of a tumor using both somatic copy number alterations and single nucleotide alterations from one or more samples derived from a single patient. Canopy is applied to bulk sequencing datasets of both longitudinal and spatial experimental designs and to a transplantable metastasis model derived from human cancer cell line MDA-MB-231. Canopy successfully identifies cell populations and infers phylogenies that are in concordance with existing knowledge and ground truth. Through simulations, we explore the effects of key parameters on deconvolution accuracy, and compare against existing methods. Part 3: Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing (scRNA-seq) allows the comparison of expression distribution between the two alleles of a diploid organism and thus the characterization of allele-specific bursting. We propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters, and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that, globally, cis control in gene expression overwhelmingly manifests as differences in burst frequency

    Advances in Single Molecule, Real-Time (SMRT) Sequencing

    Get PDF
    PacBio’s single-molecule real-time (SMRT) sequencing technology offers important advantages over the short-read DNA sequencing technologies that currently dominate the market. This includes exceptionally long read lengths (20 kb or more), unparalleled consensus accuracy, and the ability to sequence native, non-amplified DNA molecules. From fungi to insects to humans, long reads are now used to create highly accurate reference genomes by de novo assembly of genomic DNA and to obtain a comprehensive view of transcriptomes through the sequencing of full-length cDNAs. Besides reducing biases, sequencing native DNA also permits the direct measurement of DNA base modifications. Therefore, SMRT sequencing has become an attractive technology in many fields, such as agriculture, basic science, and medical research. The boundaries of SMRT sequencing are continuously being pushed by developments in bioinformatics and sample preparation. This book contains a collection of articles showcasing the latest developments and the breadth of applications enabled by SMRT sequencing technology

    一分子DNAシーケンサによるDNAメチル化情報の網羅的観測手法 : 二倍体ゲノムとセントロメア領域への応用

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 津田 宏治, 東京大学教授 村上 善則, 東京大学教授 森下 真一, 東京大学講師 笠原 雅弘, 九州大学教授 伊藤 隆司University of Tokyo(東京大学

    Genome editing for low acrylamide wheat

    Get PDF
    Acrylamide (C3H5NO) is a food processing contaminant that has been classed as a probable (Group 2a) human carcinogen. Acrylamide forms from the reaction of free (non-protein) asparagine with reducing sugars during food processing. All major cereal products are affected and wheat products represent one of the main sources of dietary acrylamide intake in Europe. Asparagine concentration is the determining factor for acrylamide formation in cereal products. Asparagine biosynthesis is catalysed by a family of enzymes called asparagine synthetases (ASNs). The ASN genes were investigated and five ASN genes (TaASN1-4, with a double copy of TaASN3) identified in wheat (Triticum aestivum), with TaASN2 showing grain-specific expression. CRISPR/Cas9 was used to knock out the TaASN2 gene of wheat cv. Cadenza. A polycistronic gene containing four gRNAs, interspaced with tRNAs, was designed and introduced into wheat embryos by particle bombardment. The subsequent edits were characterised in the T1 and T2 generations using Next Generation Sequencing nucleotide sequence analysis. Triple (A, B, and D genome) nulls were identified, alongside an AD and an A genome null. Amino acid concentrations were measured in the T2 and T3 seed, with one triple null line showing a substantial reduction in the free asparagine concentration in the grain (90 % in the T2 seed and 50 % in the T3 seed compared with wildtype). The free asparagine also reduced as a proportion of the total free amino acid pool. Significant effects were also seen in glutamate and aspartate concentrations. Free asparagine and total free amino acid concentrations were higher in the T3 than T2 seeds, probably due to heat stress, but the concentrations in the edited plants remained substantially lower than in wildtype. Some of the edited lines showed poor germination, but this could be overcome by application of exogenous asparagine and no other phenotype was noted

    Genome Editing for Low-Acrylamide Wheat

    Get PDF
    Acrylamide (C3H5NO) is a food processing contaminant that has been classed as a probable (Group 2a) human carcinogen. Acrylamide forms from the reaction of free (non-protein) asparagine with reducing sugars during food processing. All major cereal products are affected and wheat products represent one of the main sources of dietary acrylamide intake in Europe. Asparagine concentration is the determining factor for acrylamide formation in cereal products. Asparagine biosynthesis is catalysed by a family of enzymes called asparagine synthetases (ASNs). The ASN genes were investigated and five ASN genes (TaASN1-4, with a double copy of TaASN3) identified in wheat (Triticum aestivum), with TaASN2 showing grain-specific expression. CRISPR/Cas9 was used to knock out the TaASN2 gene of wheat cv. Cadenza. A polycistronic gene containing four gRNAs, interspaced with tRNAs, was designed and introduced into wheat embryos by particle bombardment. The subsequent edits were characterised in the T1 and T2 generations using Next Generation Sequencing nucleotide sequence analysis. Triple (A, B, and D genome) nulls were identified, alongside an AD and an A genome null. Amino acid concentrations were measured in the T2 and T3 seed, with one triple null line showing a substantial reduction in the free asparagine concentration in the grain (90 % in the T2 seed and 50 % in the T3 seed compared with wildtype). The free asparagine also reduced as a proportion of the total free amino acid pool. Significant effects were also seen in glutamate and aspartate concentrations. Free asparagine and total free amino acid concentrations were higher in the T3 than T2 seeds, probably due to heat stress, but the concentrations in the edited plants remained substantially lower than in wildtype. Some of the edited lines showed poor germination, but this could be overcome by application of exogenous asparagine and no other phenotype was noted

    The application of genomic technologies to cancer and companion diagnostics.

    Get PDF
    This thesis describes work undertaken by the author between 1996 and 2014. Genomics is the study of the genome, although it is also often used as a catchall phrase and applied to the transcriptome (study of RNAs) and methylome (study of DNA methylation). As cancer is a disease of the genome the rapid advances in genomic technology, specifically microarrays and next generation sequencing, are creating a wave of change in our understanding of its molecular pathology. Molecular pathology and personalised medicine are being driven by discoveries in genomics, and genomics is being driven by the development of faster, better and cheaper genome sequencing. The next decade is likely to see significant changes in the way cancer is managed for individual cancer patients as next generation sequencing enters the clinic. In chapter 3 I discuss how ERBB2 amplification testing for breast cancer is currently dominated by immunohistochemistry (a single-gene test); and present the development, by the author, of a semi-quantitative PCR test for ERBB2 amplification. I also show that estimating ERBB2 amplification from microarray copy-number analysis of the genome is possible. In chapter 4 I present a review of microarray comparison studies, and outline the case for careful and considered comparison of technologies when selecting a platform for use in a research study. Similar, indeed more stringent, care needs to be applied when selecting a platform for use in a clinical test. In chapter 5 I present co-authored work on the development of amplicon and exome methods for the detection and quantitation of somatic mutations in circulating tumour DNA, and demonstrate the impact this can have in understanding tumour heterogeneity and evolution during treatment. I also demonstrate how next-generation sequencing technologies may allow multiple genetic abnormalities to be analysed in a single test, and in low cellularity tumours and/or heterogenous cancers. Keywords: Genome, exome, transcriptome, amplicon, next-generation sequencing, differential gene expression, RNA-seq, ChIP-seq, microarray, ERBB2, companion diagnostic

    DNA Sequencing

    Get PDF
    This book illustrates methods of DNA sequencing and its application in plant, animal and medical sciences. It has two distinct sections. The one includes 2 chapters devoted to the DNA sequencing methods and the second includes 6 chapters focusing on various applications of this technology. The content of the articles presented in the book is guided by the knowledge and experience of the contributing authors. This book is intended to serve as an important resource and review to the researchers in the field of DNA sequencing

    Investigating Genetic Causes of Mendelian Congenital Myopathies

    Get PDF
    This thesis investigates the genetic aetiology of congenital myopathy in families with an unresolved genetic diagnosis. In two families, massively parallel sequencing and functional analyses identified two genetic candidates: a regulatory variant (c.*152G>T) and multi-exon deletion in a known disease gene (KLHL40), and a homozygous missense variant (c.1339T>C) in HMGCS1, a novel disease gene. This work supports the further investigation of regulatory variants for congenital myopathy screening and highlights the mevalonate pathway in muscle function
    corecore