8 research outputs found

    Automated SNP genotype clustering algorithm to improve data completeness in high-throughput SNP genotyping datasets from custom arrays

    Get PDF
    High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author

    Reliable Single Chip Genotyping with Semi-Parametric Log-Concave Mixtures

    Get PDF
    The common approach to SNP genotyping is to use (model-based) clustering per individual SNP, on a set of arrays. Genotyping all SNPs on a single array is much more attractive, in terms of flexibility, stability and applicability, when developing new chips. A new semi-parametric method, named SCALA, is proposed. It is based on a mixture model using semi-parametric log-concave densities. Instead of using the raw data, the mixture is fitted on a two-dimensional histogram, thereby making computation time almost independent of the number of SNPs. Furthermore, the algorithm is effective in low-MAF situations. Comparisons between SCALA and CRLMM on HapMap genotypes show very reliable calling of single arrays. Some heterozygous genotypes from HapMap are called homozygous by SCALA and to lesser extent by CRLMM too. Furthermore, HapMap's NoCalls (NN) could be genotyped by SCALA, mostly with high probability. The software is available as R scripts from the website www.math.leidenuniv.nl/~rrippe

    Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.</p> <p>Results</p> <p>Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.</p> <p>Conclusion</p> <p>Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.</p

    STATISTICAL CHALLENGES IN NEXT GENERATION POPULATION GENOMICS STUDY

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Single nucleotide polymorphisms (SNPs) in occupational exposure assessment

    Get PDF
    Significant individual variation exists in the systemic response to xenobiotic exposures that may be due to individual genetic differences in xenobiotic toxicokinetics, DNA-damage repair genes, host factors, and other environmentally responsive groups of genes and pathways. The source of this variation may be dependent upon single nucleotide polymorphisms (SNPs), which may result in different levels of exposure biomarkers , tissue distribution and elimination, as well as heritable differences in physiological function (e.g., blood pressure, respiratory rate). These individual genetic differences may directly or indirectly contribute to the associated health effects. Our knowledge of individual differences in toxicokinetics and systemic response to xenobiotic exposures, genetic mechanism of associated health effects, and the potential value of incorporating individual genetic variations in exposure assessment models is limited. I sought to investigate individual differences in SNPsassociated with skin naphthyl-keratin adducts (NKAs) and urine naphthol levels measured in workers exposed to jet propulsion fuel-8 (JP-8) containing naphthalene. The SNP association analysis was conducted in PLINK using candidate-gene and genome-wide analyses. I further determined the relative contributions of SNP markers and the impact of personal and workplace factors on the measured biomarker levels using multiple linear regression models. Pathway and network analyses of the genetic variants identified indicated significant associations with genes involved in the regulation of cellular and homeostasis processes that contributed to the observed level of skin NKAs. Urine naphthol levels were associated with genes involved in thyroid hormone pathways and the control of metabolism that may affect the mass movement of naphthalene and its metabolites into the cells of tissues with the capacity to metabolize and eliminate xenobiotics. I report here a method and strategy to investigate the role of individual genetic variation in the quantitative assessment of biomarker levels in small well-characterized exposed worker populations. These tools have the potential to provide biological relevance on the biomarker levels, mechanistic insight into the etiology of exposure related diseases, and identification of susceptible subpopulation with respect to exposure. Therefore, these tools will provide useful input into setting exposure limits by taking into account individual genetic variation that may relate to adverse health effects
    corecore