608 research outputs found

    Susceptibility to tuberculosis is associated with variants in the ASAP1 gene encoding a regulator of dendritic cell migration

    Get PDF
    Human genetic factors predispose to tuberculosis (TB). We studied 7.6 million genetic variants in 5,530 people with pulmonary TB and in 5,607 healthy controls. In the combined analysis of these subjects and the follow-up cohort (15,087 TB patients and controls altogether), we found an association between TB and variants located in introns of the ASAP1 gene on chromosome 8q24 (P = 2.6 × 10−11 for rs4733781; P = 1.0 × 10−10 for rs10956514). Dendritic cells (DCs) showed high ASAP1 expression that was reduced after Mycobacterium tuberculosis infection, and rs10956514 was associated with the level of reduction of ASAP1 expression. The ASAP1 protein is involved in actin and membrane remodeling and has been associated with podosomes. The ASAP1-depleted DCs showed impaired matrix degradation and migration. Therefore, genetically determined excessive reduction of ASAP1 expression in M. tuberculosis–infected DCs may lead to their impaired migration, suggesting a potential mechanism of predisposition to TB

    Rapid genotype imputation from sequence without reference panels

    Get PDF
    Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r(2) value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans

    SNPpy - Database Management for SNP Data from Genome Wide Association Studies

    Get PDF
    Background: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. Results: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. Conclusions: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is

    CNVassoc: Association analysis of CNV data using R

    Get PDF
    Background: Copy number variants (CNV) are a potentially important component of the genetic contribution to risk of common complex diseases. Analysis of the association between CNVs and disease requires that uncertainty in CNV copy-number calls, which can be substantial, be taken into account; failure to consider this uncertainty can lead to biased results. Therefore, there is a need to develop and use appropriate statistical tools. To address this issue, we have developed CNVassoc, an R package for carrying out association analysis of common copy number variants in population-based studies. This package includes functions for testing for association with different classes of response variables (e.g. class status, censored data, counts) under a series of study designs (case-control, cohort, etc) and inheritance models, adjusting for covariates. The package includes functions for inferring copy number (CNV genotype calling), but can also accept copy number data generated by other algorithms (e.g. CANARY, CGHcall, IMPUTE). Results: Here we present a new R package, CNVassoc, that can deal with different types of CNV arising from different platforms such as MLPA o aCGH. Through a real data example we illustrate that our method is able to incorporate uncertainty in the association process. We also show how our package can also be useful when analyzing imputed data when analyzing imputed SNPs. Through a simulation study we show that CNVassoc outperforms CNVtools in terms of computing time as well as in convergence failure rate. Conclusions: We provide a package that outperforms the existing ones in terms of modelling flexibility, power, convergence rate, ease of covariate adjustment, and requirements for sample size and signal quality. Therefore, we offer CNVassoc as a method for routine use in CNV association studiesThis work has been supported by the Spanish Ministry of Science and Innovation (MTM2008-02457 to JRG, BIO2009-12458 to RD-U and statistical genetics network MTM2010-09526-E (subprograma MTM) to JRG, IS, GL and RD-U). GL is supported by the Juan de la Cierva Program of the Spanish Ministry of Science and Innovation

    Associations of ATR and CHEK1 Single Nucleotide Polymorphisms with Breast Cancer

    Get PDF
    DNA damage and replication checkpoints mediated by the ATR-CHEK1 pathway are key to the maintenance of genome stability, and both ATR and CHEK1 have been proposed as potential breast cancer susceptibility genes. Many novel variants recently identified by the large resequencing projects have not yet been thoroughly tested in genome-wide association studies for breast cancer susceptibility. We therefore used a tagging SNP (tagSNP) approach based on recent SNP data available from the 1000 genomes projects, to investigate the roles of ATR and CHEK1 in breast cancer risk and survival. ATR and CHEK1 tagSNPs were genotyped in the Sheffield Breast Cancer Study (SBCS; 1011 cases and 1024 controls) using Illumina GoldenGate assays. Untyped SNPs were imputed using IMPUTE2, and associations between genotype and breast cancer risk and survival were evaluated using logistic and Cox proportional hazard regression models respectively on a per allele basis. Significant associations were further examined in a meta-analysis of published data or confirmed in the Utah Breast Cancer Study (UBCS). The most significant associations for breast cancer risk in SBCS came from rs6805118 in ATR (p=7.6x10-5) and rs2155388 in CHEK1 (p=3.1x10-6), but neither remained significant after meta-analysis with other studies. However, meta-analysis of published data revealed a weak association between the ATR SNP rs1802904 (minor allele frequency is 12%) and breast cancer risk, with a summary odds ratio (confidence interval) of 0.90 (0.83-0.98) [p=0.0185] for the minor allele. Further replication of this SNP in larger studies is warranted since it is located in the target region of 2 microRNAs. No evidence of any survival effects of ATR or CHEK1 SNPs were identified. We conclude that common alleles of ATR and CHEK1 are not implicated in breast cancer risk or survival, but we cannot exclude effects of rare alleles and of common alleles with very small effect sizes

    A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.</p> <p>Methods</p> <p>A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.</p> <p>Results</p> <p>The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.</p> <p>Conclusions</p> <p>The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.</p

    SNP selection for genes of iron metabolism in a study of genetic modifiers of hemochromatosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We report our experience of selecting tag SNPs in 35 genes involved in iron metabolism in a cohort study seeking to discover genetic modifiers of hereditary hemochromatosis.</p> <p>Methods</p> <p>We combined our own and publicly available resequencing data with HapMap to maximise our coverage to select 384 SNPs in candidate genes suitable for typing on the Illumina platform.</p> <p>Results</p> <p>Validation/design scores above 0.6 were not strongly correlated with SNP performance as estimated by Gentrain score. We contrasted results from two tag SNP selection algorithms, LDselect and Tagger. Varying r<sup>2 </sup>from 0.5 to 1.0 produced a near linear correlation with the number of tag SNPs required. We examined the pattern of linkage disequilibrium of three levels of resequencing coverage for the transferrin gene and found HapMap phase 1 tag SNPs capture 45% of the ≥ 3% MAF SNPs found in SeattleSNPs where there is nearly complete resequencing. Resequencing can reveal adjacent SNPs (within 60 bp) which may affect assay performance. We report the number of SNPs present within the region of six of our larger candidate genes, for different versions of stock genotyping assays.</p> <p>Conclusion</p> <p>A candidate gene approach should seek to maximise coverage, and this can be improved by adding to HapMap data any available sequencing data. Tag SNP software must be fast and flexible to data changes, since tag SNP selection involves iteration as investigators seek to satisfy the competing demands of coverage within and between populations, and typability on the technology platform chosen.</p

    Genetics of callous-unemotional behavior in children

    Get PDF
    Callous-unemotional behavior (CU) is currently under consideration as a subtyping index for conduct disorder diagnosis. Twin studies routinely estimate the heritability of CU as greater than 50%. It is now possible to estimate genetic influence using DNA alone from samples of unrelated individuals, not relying on the assumptions of the twin method. Here we use this new DNA method (implemented in a software package called Genome-wide Complex Trait Analysis, GCTA) for the first time to estimate genetic influence on CU. We also report the first genome-wide association (GWA) study of CU as a quantitative trait. We compare these DNA results to those from twin analyses using the same measure and the same community sample of 2,930 children rated by their teachers at ages 7, 9 and 12. GCTA estimates of heritability were near zero, even though twin analysis of CU in this sample confirmed the high heritability of CU reported in the literature, and even though GCTA estimates of heritability were substantial for cognitive and anthropological traits in this sample. No significant associations were found in GWA analysis, which, like GCTA, only detects additive effects of common DNA variants. The phrase ‘missing heritability’ was coined to refer to the gap between variance associated with DNA variants identified in GWA studies versus twin study heritability. However, GCTA heritability, not twin study heritability, is the ceiling for GWA studies because both GCTA and GWA are limited to the overall additive effects of common DNA variants, whereas twin studies are not. This GCTA ceiling is very low for CU in our study, despite its high twin study heritability estimate. The gap between GCTA and twin study heritabilities will make it challenging to identify genes responsible for the heritability of CU

    Scanning and filling : ultra-dense SNP genotyping combining genotyping-by-sequencing, SNP array and whole-genome resequencing data

    Get PDF
    Genotyping-by-sequencing (GBS) represents a highly cost-effective high-throughput genotyping approach. By nature, however, GBS is subject to generating sizeable amounts of missing data and these will need to be imputed for many downstream analyses. The extent to which such missing data can be tolerated in calling SNPs has not been explored widely. In this work, we first explore the use of imputation to fill in missing genotypes in GBS datasets. Importantly, we use whole genome resequencing data to assess the accuracy of the imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs could be called when tolerating up to 80% missing data, a five-fold increase over the number called when tolerating up to 20% missing data. At all levels of missing data examined (between 20% and 80%), the resulting SNP datasets were of uniformly high accuracy (96– 98%). We then used imputation to combine complementary SNP datasets derived from GBS and a SNP array (SoySNP50K). We thus produced an enhanced dataset of >100,000 SNPs and the genotypes at the previously untyped loci were again imputed with a high level of accuracy (95%). Of the >4,000,000 SNPs identified through resequencing 23 accessions (among the 301 used in the GBS analysis), 1.4 million tag SNPs were used as a reference to impute this large set of SNPs on the entire panel of 301 accessions. These previously untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP dataset (GBS + SoySNP50K) to perform a GWAS on seed oil content within this collection of soybean accessions. Both the number of significant marker-trait associations and the peak significance levels were improved considerably using this enhanced catalog of SNPs relative to a smaller catalog resulting from GBS alone at 20% missing data. Our results demonstrate that imputation can be used to fill in both missing genotypes and untyped loci with very high accuracy and that this leads to more powerful genetic analyses
    corecore