10 research outputs found

    Functional consequence of the MET-T1010I polymorphism in breast cancer.

    Get PDF
    Major breast cancer predisposition genes, only account for approximately 30% of high-risk breast cancer families and only explain 15% of breast cancer familial relative risk. The HGF growth factor receptor MET is potentially functionally altered due to an uncommon germline single nucleotide polymorphism (SNP), MET-T1010I, in many cancer lineages including breast cancer where the MET-T1010I SNP is present in 2% of patients with metastatic breast cancer. Expression of MET-T1010I in the context of mammary epithelium increases colony formation, cell migration and invasion in-vitro and tumor growth and invasion in-vivo. A selective effect of MET-T1010I as compared to wild type MET on cell invasion both in-vitro and in-vivo suggests that the MET-T1010I SNP may alter tumor pathophysiology and should be considered as a potential biomarker when implementing MET targeted clinical trials

    A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

    Get PDF
    BackgroundPCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.ResultsIn this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.ConclusionsThe method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates

    Biased estimates of clonal evolution and subclonal heterogeneity can arise from PCR duplicates in deep sequencing experiments

    Get PDF
    Accurate allele frequencies are important for measuring subclonal heterogeneity and clonal evolution. Deep-targeted sequencing data can contain PCR duplicates, inflating perceived read depth. Here we adapted the Illumina TruSeq Custom Amplicon kit to include single molecule tagging (SMT) and show that SMT-identified duplicates arise from PCR. We demonstrate that retention of PCR duplicate reads can imply clonal evolution when none exists, while their removal effectively controls the false positive rate. Additionally, PCR duplicates alter estimates of subclonal heterogeneity in tumor samples. Our method simplifies PCR duplicate identification and emphasizes their removal in studies of tumor heterogeneity and clonal evolution

    A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments

    Get PDF
    BACKGROUND: PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from “natural” read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. RESULTS: In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45–50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70–95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. CONCLUSIONS: The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1471-9) contains supplementary material, which is available to authorized users

    Detection of actionable mutations in ctDNA in advanced breast cancer patients

    Get PDF
    Tese de mestrado, Oncobiologia, Universidade de Lisboa, Faculdade de Medicina, 2019Background: Breast Cancer (BC) is one of the most important causes of death by cancer in the world. Cancer mortality is directly related to the inability of curing advanced disease. Much has been the effort in the last decades to develop new drugs. Precision oncology aims at delivering the most adequate treatment to each patient according to the specific characteristics of the disease at each time point. Nevertheless, considering tumor heterogeneity, both temporal and spatial, tissue biopsies might be less accurate than new emerging techniques such as circulating cell-free tumor DNA (ctDNA) analysis in blood – liquid biopsy. It is estimated that 80-90% of advanced cancer patients have genetic alterations that could potentially be targeted with a specific drug and some studies suggest that patients treated with these targeted drugs might have better outcomes, although there is controversy. This is a proof-of-concept study. With this study we aimed to determine: 1. If ctDNA can be isolated from plasma samples of patients with metastatic BC; 2. If it is possible to detect specific druggable mutations and amplifications in ctDNA, namely: PIK3CA mutation and amplification, AKT1 mutation, AKT2 amplification, EGFR amplification, FGFR1 amplification; 3. If there is an association between genetic alterations detectable in plasma and tumor biopsies performed at the same time. Methods: This is a single center prospective observational study with sample collection. We included patients with metastatic BC (MBC) de novo or after progression or relapse. We also included stage III BC patients with advanced unresectable disease. Only patients with clinical indication for re-biopsy and who gave consent for biopsy and blood sample collection were included. For each patient, analysis of the tumor and blood sample were performed with a maximum 8-week interval. DNA was extracted from tissue samples and ctDNA was isolated from plasma. Digital droplet PCR (ddPCR) was used to detect amplifications and massive parallel sequencing (MPS) was used for mutations. We extracted germline DNA (gDNA) from leukocytes to screen for mutations in targeted genes, in order to prove a potential somatic origin for the detected mutations. Results: We enrolled 2 patients who had undergone previous lines of treatment and progressed. While patient 001 had MBC (rebiopsy of a lung metastasis), patient 002 had locally advanced, unresectable disease (rebiopsy of the breast). Regarding amplification of the genes tested, we detected an amplification in FGFR1 in patient 001, both in tissue (8.5-fold increase in copy number) and plasma samples (9.7-fold increase in copy number). We also detected a PIK3CA mutation in exon 10 (coding exon 9) in patient 002, which is one of the most frequent mutations in PIK3CA found in BC [c.1633G>A p.(E545K)]. This mutation was detected only in tissue sample and not in ctDNA; this mutation was proven somatic since it was not present in the gDNA. Conclusions: We succeeded to isolate ctDNA from plasma samples for both patients – proven by the finding of the somatic variants. We were able to detect one actionable alteration for each patient: FGFR1 amplification was present in both tissue and ctDNA of patient 001. Regarding patient 002 a mutation in PIK3CA was detected, although only in tumor tissue sample. We did not find a complete concordance between mutations detected in tumor tissue and plasma samples. This might be due to several reasons, either technical or biological.Racional: O cancro da mama é, mundialmente, uma das principais causas de morte por cancro. A mortalidade relaciona-se directamente com a incapacidade de curar a doença avançada. Nas últimas décadas têm-se empreendido importantes esforços no desenvolvimento de novos fármacos. A oncologia de precisão almeja providenciar a terapêutica mais adequada a cada doente de acordo com as características específicas da doença, em cada momento. Contudo considerando a heterogeneidade tumoral, quer temporal quer espacial, as biópsias tecidulares podem ser menos precisas que novas técnicas tal como a análise de DNA tumoral circulante no sangue (ctDNA) – biópsia líquida. Estima-se que 80-90% dos doentes com cancros avançados apresentem alterações genéticas a nível do tumor que poderiam potencialmente ser alvo de terapêutica com fármacos dirigidos. Na verdade, alguns estudos sugerem que os doentes tratados com fármacos dirigidos tenham melhores resultados em termos de saúde, embora seja controverso. Este é um estudo de prova de conceito. Com este estudo procuramos determinar: 1. Se o ctDNA pode ser isolado de amostras plasmáticas de doentes com cancro da mama metastático; 2. Se é possível detectar determinadas mutações e amplificações que possam ser alvo terapêutico no ctDNA. Nomeadamente: mutação e amplificação PIK3CA, mutação AKT1, amplificação AKT2, amplificação EGFR, amplificação FGFR1; 3. Se há associação entre as alterações genéticas detectadas no plasma e em biópsias tecidulares realizadas simultaneamente. Métodos: Este é um estudo observacional prospetivo unicêntrico com colheita de amostras. Incluímos doentes com cancro da mama metastático de novo ou após recidiva ou progressão. Também incluímos doentes com cancro da mama estadio III com doença irressecável. Foram apenas incluídos doentes com indicação clínica para re-biópsia e que consentiram quer a colheita de tecido quer de sangue. Para cada doente a análise tumoral e de plasma foram realizadas com um intervalo máximo de 8 semanas. O DNA foi extraído de amostras de tecido e o ctDNA foi isolado a partir do plasma. Usámos Digital droplet PCR (ddPCR) para detectar amplificações e sequenciação massiva em paralelo (MPS) para mutações. Extraímos DNA germinal (gDNA) de leucócitos e analisámos mutações em genes algo com MPS. Resultados: Foram incluídos 2 doentes que tinham sido submetidos a linhas terapêuticas prévias com progressão. Enquanto o doente 001 tinha neoplasia da mama metastática (biópsia de metástase pulmonar), o doente 002 tinha doença localmente avançada, irressecável (biópsia da mama). O DNA foi isolado das amostras de plasma e quantificado; estava presente DNA em ambas as amostras plasmáticas. Foi extraído DNA de amostras congeladas obtidas por biópsia. Considerando a amplificação dos genes testados, detectámos amplificação no FGFR1 no doente 001, quer no tecido (aumento do número de cópias em 8.5 vezes), quer no plasma (aumento do número de cópias em 9.7 vezes). No doente 002 foi detectada uma mutação no PIK3CA no exão 10 (exão codificante 9), que é a mutação mais frequente do PIK3CA encontrada no cancro da mama [c.1633G>A, p.(E545K)]. Esta mutação foi apenas detectada na amostra tecidular e não no ctDNA; esta mutação não estava presente no DNA germinal isolado a partir de leucócitos, pelo que se comprovou ser somática. Conclusões: Foi possível isolar DNA circulante do plasma de ambos os doentes – facto comprovado pela detecção de variantes somáticas. Foi possível detectar alterações passíveis de ser alvos terapêuticos em ambos os doentes: amplificação do FGFR1 (tecidular e plasmática) no doente 001 e mutação do PIK3CA no doente 002 – embora esta tenha sido apenas detectada em amostra tecidular. Não observamos uma completa concordância das alterações genéticas detectadas no tecido tumoral e no plasma. Isto poderá dever-se a motivos técnicos ou biológicos

    Optimising gene expression profiling using RNA-seq

    Get PDF

    Dissecting the genetic architecture of cardiac disorders through the use of next generation sequencing

    Get PDF
    The overriding goal of this thesis was to further re ne our understanding of the genetic architecture of cardiomyopathies, Arrhythmogenic Right Ventricular Cardiomyopathy (ARVC) and Hypertrophic Cardiomyopathy (HCM). 407 patients with ARVC and 957 with HCM had 41 cardiomyopathy and other putative candidate genes sequenced. By comparing these cohorts against each other and against ethnicity and phenotype matched controls, insights were gained into the role of di erent types of genetic variants in these conditions. This in part involved utilising 4500 Whole Exome Sequences (WES) that are part of the UCLexomes consortium, an in-house dataset that aggregates a diverse set of studies. High throughput DNA sequencing technologies, WES or Whole Genome Sequencing (WGS) are revolutionizing the diagnosis and novel gene discovery for rare disorders. As the eld transitions from the early discovery for Mendelian and near Mendelian diseases to more complex and oligo-genic diseases, there is substantial bene t in being able to combine data across studies, performing the type of meta-analysis for cases and controls that have proven to be so successful for Genome-Wide Association Studies (GWAS). However, WGS and WES are substantially more a ected by sequencing errors and technical artefacts than genome-wide genotyping arrays. As a consequence, meta-analysis of sequence based association studies are often dominated by spurious associations, which result in technical limitations. Here, we show that it is possible to take advantage of the type of mixed models developed initially to control for population structure in GWAS studies, and apply these ideas to control for technical artefacts. In an attempt to ascertain the role of CNVs in HCM, these data were examined for the presence of rare causative CNVs. 12 CNVs were identi ed from an initial Read Depth approach. 4 of these were subsequently validated by CoNIFER, a bioinformatics method, and Array Comparative Genomic Hybridisation (aCGH): one large deletion in MYBPC3, one large deletion in PDLIM3, one duplication of the entire TNNT2 gene and one large duplication in LMNA. These results show that the role of CNVs in HCM is small and highlight the e ciency of this two step-strategy

    Novel Algorithm Development for ‘NextGeneration’ Sequencing Data Analysis

    Get PDF
    In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applications for interrogating whole genomes and transcriptomes in research, diagnostic and forensic settings. While the innovations in sequencing have been explosive, the development of scalable and robust bioinformatics software and algorithms for the analysis of new types of data generated by these technologies have struggled to keep up. As a result, large volumes of NGS data available in public repositories are severely underutilised, despite providing a rich resource for data mining applications. Indeed, the bottleneck in genome and transcriptome sequencing experiments has shifted from data generation to bioinformatics analysis and interpretation. This thesis focuses on development of novel bioinformatics software to bridge the gap between data availability and interpretation. The work is split between two core topics – computational prioritisation/identification of disease gene variants and identification of RNA N6 -adenosine Methylation from sequencing data. The first chapter briefly discusses the emergence and establishment of NGS technology as a core tool in biology and its current applications and perspectives. Chapter 2 introduces the problem of variant prioritisation in the context of Mendelian disease, where tens of thousands of potential candidates are generated by a typical sequencing experiment. Novel software developed for candidate gene prioritisation is described that utilises data mining of tissue-specific gene expression profiles (Chapter 3). The second part of chapter investigates an alternative approach to candidate variant prioritisation by leveraging functional and phenotypic descriptions of genes and diseases from multiple biomedical domain ontologies (Chapter 4). Chapter 5 discusses N6 AdenosineMethylation, a recently re-discovered posttranscriptional modification of RNA. The core of the chapter describes novel software developed for transcriptome-wide detection of this epitranscriptomic mark from sequencing data. Chapter 6 presents a case study application of the software, reporting the previously uncharacterised RNA methylome of Kaposi’s Sarcoma Herpes Virus. The chapter further discusses a putative novel N6-methyl-adenosine -RNA binding protein and its possible roles in the progression of viral infection
    corecore