2,023 research outputs found

    GATK hard filtering: tunable parameters to improve variant calling for next generation sequencing targeted gene panel data

    Get PDF
    BACKGROUND: NGS technology represents a powerful alternative to the standard Sanger sequencing in the context of clinical setting. The proprietary software that are generally used for variant calling often depend on preset parameters that may not fit in a satisfactory manner for different genes. GATK, which is widely used in the academic world, is rich in parameters for variant calling. However the self-adjusting parameter calibration of GATK requires data from a large number of exomes. When these are not available, which is the standard condition of a diagnostic laboratory, the parameters must be set by the operator (hard filtering). The aim of the present paper was to set up a procedure to assess the best parameters to be used in the hard filtering of GATK. This was pursued by using classification trees on true and false variants from simulated sequences of a real dataset data. RESULTS: We simulated two datasets, with different coverages, including all the sequence alterations identified in a real dataset according to their observed frequencies. Simulated sequences were aligned with standard protocols and then regression trees were built up to identify the most reliable parameters and cutoff values to discriminate true and false variant calls. Moreover, we analyzed flanking sequences of region presenting a high rate of false positive calls observing that such sequences present a low complexity make up. CONCLUSIONS: Our results showed that GATK hard filtering parameter values can be tailored through a simulation study based-on the DNA region of interest to ameliorate the accuracy of the variant calling

    Candidate Sequence Variants for Polyautoimmunity and Multiple Autoimmune Syndrome from a Colombian Genetic Isolate: Implications for Population Genetics

    Get PDF
    Autoimmunity is an immunological disorder whereby patients have lost immunological tolerance to self-antigen. It has extreme financial and socioeconomic burden with costs of over 100 billion dollars in the USA alone, and an estimated prevalence of 9.4%, and evidence indicates that this estimate has increased at a rate of 5% per year for the past 3 years. These phenotypes can be manifested in more severe forms through polyautoimmunity, whereby patients are carrying 2 or more autoimmune conditions. In addition to that, there is also the most extreme phenotype of autoimmunity known as the Multiple Autoimmune Syndrome (MAS), consisting of cases where patients have 3 or more autoimmune diseases. These extreme phenotypes are extremely important for genetic research as will be elaborated upon in this thesis. For more than 20 years, pedigrees from the world’s largest known genetic isolate, from the Paisa region of Colombia have been ascertained and thoroughly followed by Dr. Juan-Manuel Anaya and Dr. Mauricio Arcos-Burgos. This population has maintained its status as a genetic isolate since the 16th century, during the early colonization by the Spanish Conquistadors. In this thesis, our attempts in identifying potential candidate variants potentially underpinning the genetic etiology of autoimmune conditions in this population is facilitated by the fact that families are derived from individuals carrying extreme phenotypes, from familial cohorts where genetic homogeneity is maximized. Candidates are identified in both sporadic as well as familial cases. This is primarily achieved through combination of linkage analysis and association tests for both rare and common variants, derived from variant-calling pipelines and that had undergone quality control, filtering and functional annotation, via bioinformatic anlayses. Genes harbouring variants with significant evidence of linkage and association were primarily involved in negative regulation of apoptosis, phagocytosis, regulation of endopeptidase activity, response to lipopolysaccharides and plasminogen urokinase receptor activity. These findings, that were obtained by utilizing the combinations of statistical as well as network-based analyses have relevant potential implications in autoimmunity, and can be further supported with additional studies

    Identification of putative second genetic hits in schizophrenia carriers of high-risk copy number variants and resequencing in additional samples

    Get PDF
    Copy number variants (CNVs) conferring risk of schizophrenia present incomplete penetrance, suggesting the existence of second genetic hits. Identification of second hits may help to find genes with rare variants of susceptibility to schizophrenia. The aim of this work was to search for second hits of moderate/high risk in schizophrenia carriers of risk CNVs and resequencing of the relevant genes in additional samples. To this end, ten patients with risk CNVs at cytobands 15q11.2, 15q11.2-13.1, 16p11.2, or 16p13.11, were subjected to whole-exome sequencing. Rare single nucleotide variants, defined as those absent from main public databases, were classified according to bioinformatic prediction of pathogenicity by CADD scores. The average number of rare predicted pathogenic variants per sample was 13.6 (SD 2.01). Two genes, BFAR and SYNJ1, presented rare predicted pathogenic variants in more than one sample. Follow-up resequencing of these genes in 432 additional cases and 432 controls identified a significant excess of rare predicted pathogenic variants in case samples at SYNJ1. Taking into account its function in clathrin-mediated synaptic vesicle endocytosis at presynaptic terminals, our results suggest an impairment of this process in schizophrenia

    Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

    Get PDF
    High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies.Peer reviewe
    corecore