1,816 research outputs found

    CoNVEX: Copy number variation estimation in exome sequencing data using HMM

    Get PDF
    Background One of the main types of genetic variations in cancer is Copy Number Variations (CNV). Whole exome sequenicng (WES) is a popular alternative to whole genome sequencing (WGS) to study disease specific genomic variations. However, finding CNV in Cancer samples using WES data has not been fully explored. Results We present a new method, called CoNVEX, to estimate copy number variation in whole exome sequencing data. It uses ratio of tumour and matched normal average read depths at each exonic region, to predict the copy gain or loss. The useful signal produced by WES data will be hindered by the intrinsic noise present in the data itself. This limits its capacity to be used as a highly reliable CNV detection source. Here, we propose a method that consists of discrete wavelet transform (DWT) to reduce noise. The identification of copy number gains/losses of each targeted region is performed by a Hidden Markov Model (HMM). Conclusion HMM is frequently used to identify CNV in data produced by various technologies including Array Comparative Genomic Hybridization (aCGH) and WGS. Here, we propose an HMM to detect CNV in cancer exome data. We used modified data from 1000 Genomes project to evaluate the performance of the proposed method. Using these data we have shown that CoNVEX outperforms the existing methods significantly in terms of precision. Overall, CoNVEX achieved a sensitivity of more than 92% and a precision of more than 50%

    A Path to Implement Precision Child Health Cardiovascular Medicine.

    Get PDF
    Congenital heart defects (CHDs) affect approximately 1% of live births and are a major source of childhood morbidity and mortality even in countries with advanced healthcare systems. Along with phenotypic heterogeneity, the underlying etiology of CHDs is multifactorial, involving genetic, epigenetic, and/or environmental contributors. Clear dissection of the underlying mechanism is a powerful step to establish individualized therapies. However, the majority of CHDs are yet to be clearly diagnosed for the underlying genetic and environmental factors, and even less with effective therapies. Although the survival rate for CHDs is steadily improving, there is still a significant unmet need for refining diagnostic precision and establishing targeted therapies to optimize life quality and to minimize future complications. In particular, proper identification of disease associated genetic variants in humans has been challenging, and this greatly impedes our ability to delineate gene-environment interactions that contribute to the pathogenesis of CHDs. Implementing a systematic multileveled approach can establish a continuum from phenotypic characterization in the clinic to molecular dissection using combined next-generation sequencing platforms and validation studies in suitable models at the bench. Key elements necessary to advance the field are: first, proper delineation of the phenotypic spectrum of CHDs; second, defining the molecular genotype/phenotype by combining whole-exome sequencing and transcriptome analysis; third, integration of phenotypic, genotypic, and molecular datasets to identify molecular network contributing to CHDs; fourth, generation of relevant disease models and multileveled experimental investigations. In order to achieve all these goals, access to high-quality biological specimens from well-defined patient cohorts is a crucial step. Therefore, establishing a CHD BioCore is an essential infrastructure and a critical step on the path toward precision child health cardiovascular medicine

    Adaptive Savitzky–Golay Filters for Analysis of Copy Number Variation Peaks from Whole-Exome Sequencing Data

    Get PDF
    Copy number variation (CNV) is a form of structural variation in the human genome that provides medical insight into complex human diseases; while whole-genome sequencing is becoming more affordable, whole-exome sequencing (WES) remains an important tool in clinical diagnostics. Because of its discontinuous nature and unique characteristics of sparse target-enrichment-based WES data, the analysis and detection of CNV peaks remain difficult tasks. The Savitzky–Golay (SG) smoothing is well known as a fast and efficient smoothing method. However, no study has documented the use of this technique for CNV peak detection. It is well known that the effectiveness of the classical SG filter depends on the proper selection of the window length and polynomial degree, which should correspond with the scale of the peak because, in the case of peaks with a high rate of change, the effectiveness of the filter could be restricted. Based on the Savitzky–Golay algorithm, this paper introduces a novel adaptive method to smooth irregular peak distributions. The proposed method ensures high-precision noise reduction by dynamically modifying the results of the prior smoothing to automatically adjust parameters. Our method offers an additional feature extraction technique based on density and Euclidean distance. In comparison to classical Savitzky–Golay filtering and other peer filtering methods, the performance evaluation demonstrates that adaptive Savitzky–Golay filtering performs better. According to experimental results, our method effectively detects CNV peaks across all genomic segments for both short and long tags, with minimal peak height fidelity values (i.e., low estimation bias). As a result, we clearly demonstrate how well the adaptive Savitzky–Golay filtering method works and how its use in the detection of CNV peaks can complement the existing techniques used in CNV peak analysis

    Identification and interpretation of pathogenic variants following Next Generation Sequencing (NGS) analysis in human Mendelian disorders

    Get PDF
    Durante il programma di dottorato, l'attenzione è stata rivolta al supporto del laboratorio di diagnostica nell'implementazione della convalida o nella scoperta di varianti insolite. Questo è di massima importanza per comprendere i meccanismi eziopatogenetici molecolari, ma anche per offrire la migliore consulenza alle famiglie. Di conseguenza, sono stati portati a termine diversi progetti come segue: I) un caso enigmatico di una femmina con un disturbo granulomatoso cronico legato all'X (CGD) con una presunta variante di splicing: (NM_000397:ex9:c.1151+2T>C) nel gene CYBB. II) una nuova presunta variante di splicing emizigote nel gene MAGT1 (NM_032121:c.627+2T>C) situato sul cromosoma X. III) Analisi delle variazioni del numero di copie (CNV) per aumentare il tasso di diagnosi di un pannello NGS per gli errori congeniti dell'immunità, poiché è ben noto che le CNV (inserzioni o eliminazioni di dimensioni comprese tra 2 e 50 megabasi) rappresentano circa il 12% delle anomalie genetiche. Identificare questa ampia variazione è ancora problematico, specialmente con le piattaforme Ion Torrent che utilizziamo per la diagnostica, pertanto abbiamo eseguito un'approfondita analisi in silico utilizzando diversi nuovi software. IV) Otto famiglie con una storia personale o familiare di cancro sono state testate per un pannello di geni multipli osono state sottoposte al sequenziamento completo dell’esoma. Sono state trovate otto varianti patogeniche e verificate tramite sequenziamento di Sanger o MLPA e PCR in Real-time. L'uso di NGS e la rilevazione di CNV hanno migliorato la diagnosi nei pazienti affetti da cancro. Alcune delle famiglie iraniane che soddisfacevano i criteri di Amsterdam sono state incluse in programmi di sorveglianza indipendentemente dal loro stato di portatori di mutazioni prima dei test genetici, mentre dopo la rivelazione del portatore solo i portatori sono stati inclusi, migliorando la conformità e riducendo i costi di gestione.During the PhD program the focus was to support diagnostic lab implementing validation or discover of unusual variants. This is of utmost importance to understand molecular etiopathogenic mechanisms, but also in order to offer the best counselling to families. Thus, different projects were accomplished as follows: I) a puzzling patient of a female with X-linked chronic granulomatous disorder (CGD) with a putative splicing variant: (NM_000397:ex9:c.1151+2T>C) in the CYBB gene. II) a novel hemizygous putative splicing mutation in the MAGT1 gene (NM_032121:c.627+2T>C) located on the X-chromosome. III) Analysis of Copy Number Variations (CNVs) to increase the diagnostic rate of a NGS panel for Inborn errors of immunity, as is well known that CNVs (indels between 2 and 50 megabases), account for roughly 12% of genetic abnormalities. Identifying this large variation is still problematic, especially with the Ion Torrent platforms we use for diagnostic, thus we performed an extensive in silico analysis using multiple new softwares. IV) Eight families possessing a familial or personal history of cancer underwent multigene panel testing or whole exome sequencing. Eight pathogenic variants were found and verified through Sanger sequencing or MLPA and real-time PCR. The use of NGS and CNV detection improved the diagnostic yields in cancer patients. Some of Iranian families who met Amsterdam criteria were enrolled in surveillance programs irrespective of their mutation carrier status before genetic testing, while after carrier detection disclosures only carriers were enrolled improving compliance and decreasing the managing cost

    Statistical Methods For Genomic And Transcriptomic Sequencing

    Get PDF
    Part 1: High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but CNV profiling from whole-exome sequencing (WES) is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for WES data. CODEX includes a Poisson latent factor model, which includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based segmentation procedure that explicitly models the count-based WES data. CODEX is compared to existing methods on germline CNV detection in HapMap samples using microarray-based gold standard and is further evaluated on 222 neuroblastoma samples with matched normal, with focus on somatic CNVs within the ATRX gene. Part 2: Cancer is a disease driven by evolutionary selection on somatic genetic and epigenetic alterations. We propose Canopy, a method for inferring the evolutionary phylogeny of a tumor using both somatic copy number alterations and single nucleotide alterations from one or more samples derived from a single patient. Canopy is applied to bulk sequencing datasets of both longitudinal and spatial experimental designs and to a transplantable metastasis model derived from human cancer cell line MDA-MB-231. Canopy successfully identifies cell populations and infers phylogenies that are in concordance with existing knowledge and ground truth. Through simulations, we explore the effects of key parameters on deconvolution accuracy, and compare against existing methods. Part 3: Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing (scRNA-seq) allows the comparison of expression distribution between the two alleles of a diploid organism and thus the characterization of allele-specific bursting. We propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters, and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that, globally, cis control in gene expression overwhelmingly manifests as differences in burst frequency

    Biological Role and Disease Impact of Copy Number Variation in Complex Disease

    Get PDF
    In the human genome, DNA variants give rise to a variety of complex phenotypes. Ranging from single base mutations to copy number variations (CNVs), many of these variants are neutral in selection and disease etiology, making difficult the detection of true common or rare frequency disease-causing mutations. However, allele frequency comparisons in cases, controls, and families may reveal disease associations. Single nucleotide polymorphism (SNP) arrays and exome sequencing are popular assays for genome-wide variant identification. To limit bias between samples, uniform testing is crucial, including standardized platform versions and sample processing. Bases occupy single points while copy variants occupy segments. Bases are bi-allelic while copies are multi-allelic. One genome also encodes many different cell types. In this study, we investigate how CNV impacts different cell types, including heart, brain and blood cells, all of which serve as models of complex disease. Here, we describe ParseCNV, a systematic algorithm specifically developed as a part of this project to perform more accurate disease associations using SNP arrays or exome sequencing-generated CNV calls with quality tracking of variants, contributing to each significant overlap signal. Red flags of variant quality, genomic region, and overlap profile are assessed in a continuous score and shown to correlate over 90% with independent verification methods. We compared these data with our large internal cohort of 68,000 subjects, with carefully mapped CNVs, which gave a robust rare variant frequency in unaffected populations. In these investigations, we uncovered a number of loci in which CNVs are significantly enriched in non-coding RNA (ncRNA), Online Mendelian Inheritance in Man (OMIM), and genome-wide association study (GWAS) regions, impacting complex disease. By evaluating thoroughly the variant frequencies in pediatric individuals, we subsequently compared these frequencies in geriatric individuals to gain insight of these variants\u27 impact on lifespan. Longevity-associated CNVs enriched in pediatric patients were found to aggregate in alternative splicing genes. Congenital heart disease is the most common birth defect and cause of infant mortality. When comparing congenital heart disease families, with cases and controls genotyped both on SNP arrays and exome sequencing, we uncovered significant and confident loci that provide insight into the molecular basis of disease. Neurodevelopmental disease affects the quality of life and cognitive potential of many children. In the neurodevelopmental and psychiatric diseases, CACNA, GRM, CNTN, and SLIT gene families show multiple significant signals impacting a large number of developmental and psychiatric disease traits, with the potential of informing therapeutic decision-making. Through new tool development and analysis of large disease cohorts genotyped on a variety of assays, I have uncovered an important biological role and disease impact of CNV in complex disease
    corecore