69 research outputs found

    Redundancy in Genotyping Arrays

    Get PDF
    Despite their unprecedented density, current SNP genotyping arrays contain large amounts of redundancy, with up to 40 oligonucleotide features used to query each SNP. By using publicly available reference genotype data from the International HapMap, we show that 93.6% sensitivity at <5% false positive rate can be obtained with only four probes per SNP, compared with 98.3% with the full data set. Removal of this redundancy will allow for more comprehensive whole-genome association studies with increased SNP density and larger sample sizes

    ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS

    Get PDF
    Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer. More than one decade ago comparative genomic hybridization (CGH)technology was developed to detect copy number changes in a high-throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo packagehttp://www.bioconductor.org)

    Gene Flow between the Korean Peninsula and Its Neighboring Countries

    Get PDF
    SNP markers provide the primary data for population structure analysis. In this study, we employed whole-genome autosomal SNPs as a marker set (54,836 SNP markers) and tested their possible effects on genetic ancestry using 320 subjects covering 24 regional groups including Northern ( = 16) and Southern ( = 3) Asians, Amerindians ( = 1), and four HapMap populations (YRI, CEU, JPT, and CHB). Additionally, we evaluated the effectiveness and robustness of 50K autosomal SNPs with various clustering methods, along with their dependencies on recombination hotspots (RH), linkage disequilibrium (LD), missing calls and regional specific markers. The RH- and LD-free multi-dimensional scaling (MDS) method showed a broad picture of human migration from Africa to North-East Asia on our genome map, supporting results from previous haploid DNA studies. Of the Asian groups, the East Asian group showed greater differentiation than the Northern and Southern Asian groups with respect to Fst statistics. By extension, the analysis of monomorphic markers implied that nine out of ten historical regions in South Korea, and Tokyo in Japan, showed signs of genetic drift caused by the later settlement of East Asia (South Korea, Japan and China), while Gyeongju in South East Korea showed signs of the earliest settlement in East Asia. In the genome map, the gene flow to the Korean Peninsula from its neighboring countries indicated that some genetic signals from Northern populations such as the Siberians and Mongolians still remain in the South East and West regions, while few signals remain from the early Southern lineages

    Conserved Role of unc-79 in Ethanol Responses in Lightweight Mutant Mice

    Get PDF
    The mechanisms by which ethanol and inhaled anesthetics influence the nervous system are poorly understood. Here we describe the positional cloning and characterization of a new mouse mutation isolated in an N-ethyl-N-nitrosourea (ENU) forward mutagenesis screen for animals with enhanced locomotor activity. This allele, Lightweight (Lwt), disrupts the homolog of the Caenorhabditis elegans (C. elegans) unc-79 gene. While Lwt/Lwt homozygotes are perinatal lethal, Lightweight heterozygotes are dramatically hypersensitive to acute ethanol exposure. Experiments in C. elegans demonstrate a conserved hypersensitivity to ethanol in unc-79 mutants and extend this observation to the related unc-80 mutant and nca-1;nca-2 double mutants. Lightweight heterozygotes also exhibit an altered response to the anesthetic isoflurane, reminiscent of unc-79 invertebrate mutant phenotypes. Consistent with our initial mapping results, Lightweight heterozygotes are mildly hyperactive when exposed to a novel environment and are smaller than wild-type animals. In addition, Lightweight heterozygotes exhibit increased food consumption yet have a leaner body composition. Interestingly, Lightweight heterozygotes voluntarily consume more ethanol than wild-type littermates. The acute hypersensitivity to and increased voluntary consumption of ethanol observed in Lightweight heterozygous mice in combination with the observed hypersensitivity to ethanol in C. elegans unc-79, unc-80, and nca-1;nca-2 double mutants suggests a novel conserved pathway that might influence alcohol-related behaviors in humans

    Empirical Bayes analysis of single nucleotide polymorphisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors.</p> <p>Results</p> <p>In this paper, we propose a modification of this empirical Bayes analysis that can be used to analyze high-dimensional categorical SNP data. This approach along with a generalized version of the original empirical Bayes method are available in the R package siggenes version 1.10.0 and later that can be downloaded from <url>http://www.bioconductor.org</url>.</p> <p>Conclusion</p> <p>As applications to two subsets of the HapMap data show, the empirical Bayes analysis of microarrays cannot only be used to analyze continuous gene expression data, but also be applied to categorical SNP data, where the response is not restricted to be binary. In association studies in which typically several ten to a few hundred SNPs are considered, our approach can furthermore be employed to test interactions of SNPs. Moreover, the posterior probabilities resulting from the empirical Bayes analysis of (prespecified) interactions/genotypes can also be used to quantify the importance of these interactions.</p

    Detecting autozygosity through runs of homozygosity: A comparison of three autozygosity detection algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A central aim for studying runs of homozygosity (ROHs) in genome-wide SNP data is to detect the effects of autozygosity (stretches of the two homologous chromosomes within the same individual that are identical by descent) on phenotypes. However, it is unknown which current ROH detection program, and which set of parameters within a given program, is optimal for differentiating ROHs that are truly autozygous from ROHs that are homozygous at the marker level but vary at unmeasured variants between the markers.</p> <p>Method</p> <p>We simulated 120 Mb of sequence data in order to know the true state of autozygosity. We then extracted common variants from this sequence to mimic the properties of SNP platforms and performed ROH analyses using three popular ROH detection programs, PLINK, GERMLINE, and BEAGLE. We varied detection thresholds for each program (e.g., prior probabilities, lengths of ROHs) to understand their effects on detecting known autozygosity.</p> <p>Results</p> <p>Within the optimal thresholds for each program, PLINK outperformed GERMLINE and BEAGLE in detecting autozygosity from distant common ancestors. PLINK's sliding window algorithm worked best when using SNP data pruned for linkage disequilibrium (LD).</p> <p>Conclusion</p> <p>Our results provide both general and specific recommendations for maximizing autozygosity detection in genome-wide SNP data, and should apply equally well to research on whole-genome autozygosity burden or to research on whether specific autozygous regions are predictive using association mapping methods.</p

    TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput genotyping microarrays assess both total DNA copy number and allelic composition, which makes them a tool of choice for copy number studies in cancer, including total copy number and loss of heterozygosity (LOH) analyses. Even after state of the art preprocessing methods, allelic signal estimates from genotyping arrays still suffer from systematic effects that make them difficult to use effectively for such downstream analyses.</p> <p>Results</p> <p>We propose a method, TumorBoost, for normalizing allelic estimates of one tumor sample based on estimates from a single matched normal. The method applies to any paired tumor-normal estimates from any microarray-based technology, combined with any preprocessing method. We demonstrate that it increases the signal-to-noise ratio of allelic signals, making it significantly easier to detect allelic imbalances.</p> <p>Conclusions</p> <p>TumorBoost increases the power to detect somatic copy-number events (including copy-neutral LOH) in the tumor from allelic signals of Affymetrix or Illumina origin. We also conclude that high-precision allelic estimates can be obtained from a single pair of tumor-normal hybridizations, if TumorBoost is combined with single-array preprocessing methods such as (allele-specific) CRMA v2 for Affymetrix or BeadStudio's (proprietary) XY-normalization method for Illumina. A bounded-memory implementation is available in the open-source and cross-platform R package <it>aroma.cn</it>, which is part of the Aroma Project (<url>http://www.aroma-project.org/</url>).</p

    High Differentiation among Eight Villages in a Secluded Area of Sardinia Revealed by Genome-Wide High Density SNPs Analysis

    Get PDF
    To better design association studies for complex traits in isolated populations it's important to understand how history and isolation moulded the genetic features of different communities. Population isolates should not “a priori” be considered homogeneous, even if the communities are not distant and part of a small region. We studied a particular area of Sardinia called Ogliastra, characterized by the presence of several distinct villages that display different history, immigration events and population size. Cultural and geographic isolation characterized the history of these communities. We determined LD parameters in 8 villages and defined population structure through high density SNPs (about 360 K) on 360 unrelated people (45 selected samples from each village). These isolates showed differences in LD values and LD map length. Five of these villages show high LD values probably due to their reduced population size and extreme isolation. High genetic differentiation among villages was detected. Moreover population structure analysis revealed a high correlation between genetic and geographic distances. Our study indicates that history, geography and biodemography have influenced the genetic features of Ogliastra communities producing differences in LD and population structure. All these data demonstrate that we can consider each village an isolate with specific characteristics. We suggest that, in order to optimize the study design of complex traits, a thorough characterization of genetic features is useful to identify the presence of sub-populations and stratification within genetic isolates

    Identification of PLCL1 Gene for Hip Bone Size Variation in Females in a Genome-Wide Association Study

    Get PDF
    Osteoporosis, the most prevalent metabolic bone disease among older people, increases risk for low trauma hip fractures (HF) that are associated with high morbidity and mortality. Hip bone size (BS) has been identified as one of the key measurable risk factors for HF. Although hip BS is highly genetically determined, genetic factors underlying the trait are still poorly defined. Here, we performed the first genome-wide association study (GWAS) of hip BS interrogating ∼380,000 SNPs on the Affymetrix platform in 1,000 homogeneous unrelated Caucasian subjects, including 501 females and 499 males. We identified a gene, PLCL1 (phospholipase c-like 1), that had four SNPs associated with hip BS at, or approaching, a genome-wide significance level in our female subjects; the most significant SNP, rs7595412, achieved a p value of 3.72×10−7. The gene's importance to hip BS was replicated using the Illumina genotyping platform in an independent UK cohort containing 1,216 Caucasian females. Two SNPs of the PLCL1 gene, rs892515 and rs9789480, surrounded by the four SNPs identified in our GWAS, achieved p values of 8.62×10−3 and 2.44×10−3, respectively, for association with hip BS. Imputation analyses on our GWAS and the UK samples further confirmed the replication signals; eight SNPs of the gene achieved combined imputed p values<10−5 in the two samples. The PLCL1 gene's relevance to HF was also observed in a Chinese sample containing 403 females, including 266 with HF and 177 control subjects. A SNP of the PLCL1 gene, rs3771362 that is only ∼0.6 kb apart from the most significant SNP detected in our GWAS (rs7595412), achieved a p value of 7.66×10−3 (odds ratio = 0.26) for association with HF. Additional biological support for the role of PLCL1 in BS comes from previous demonstrations that the PLCL1 protein inhibits IP3 (inositol 1,4,5-trisphosphate)-mediated calcium signaling, an important pathway regulating mechanical sensing of bone cells. Our findings suggest that PLCL1 is a novel gene associated with variation in hip BS, and provide new insights into the pathogenesis of HF

    Powerful Bivariate Genome-Wide Association Analyses Suggest the SOX6 Gene Influencing Both Obesity and Osteoporosis Phenotypes in Males

    Get PDF
    Current genome-wide association studies (GWAS) are normally implemented in a univariate framework and analyze different phenotypes in isolation. This univariate approach ignores the potential genetic correlation between important disease traits. Hence this approach is difficult to detect pleiotropic genes, which may exist for obesity and osteoporosis, two common diseases of major public health importance that are closely correlated genetically. was previously found to be essential to both cartilage formation/chondrogenesis and obesity-related insulin resistance, suggesting the gene's dual role in both bone and fat. gene's importance in co-regulation of obesity and osteoporosis
    corecore