205 research outputs found
The effect of minor allele frequency on the likelihood of obtaining false positives
Determining the most promising single-nucleotide polymorphisms (SNPs) presents a challenge in genome-wide association studies, when hundreds of thousands of association tests are conducted. The power to detect genetic effects is dependent on minor allele frequency (MAF), and genome-wide association studies SNP arrays include SNPs with a wide distribution of MAFs. Therefore, it is critical to understand MAF's effect on the false positive rate
A unified framework for multi-locus association analysis of both common and rare variants
<p>Abstract</p> <p>Background</p> <p>Common, complex diseases are hypothesized to result from a combination of common and rare genetic variants. We developed a unified framework for the joint association testing of both types of variants. Within the framework, we developed a union-intersection test suitable for genome-wide analysis of single nucleotide polymorphisms (SNPs), candidate gene data, as well as medical sequencing data. The union-intersection test is a composite test of association of genotype frequencies and differential correlation among markers.</p> <p>Results</p> <p>We demonstrated by computer simulation that the false positive error rate was controlled at the expected level. We also demonstrated scenarios in which the multi-locus test was more powerful than traditional single marker analysis. To illustrate use of the union-intersection test with real data, we analyzed a publically available data set of 319,813 autosomal SNPs genotyped for 938 cases of Parkinson disease and 863 neurologically normal controls for which no genome-wide significant results were found by traditional single marker analysis. We also analyzed an independent follow-up sample of 183 cases and 248 controls for replication.</p> <p>Conclusions</p> <p>We identified a single risk haplotype with a directionally consistent effect in both samples in the gene <it>GAK</it>, which is involved in clathrin-mediated membrane trafficking. We also found suggestive evidence that directionally inconsistent marginal effects from single marker analysis appeared to result from risk being driven by different haplotypes in the two samples for the genes <it>SYN3 </it>and <it>NGLY1</it>, which are involved in neurotransmitter release and proteasomal degradation, respectively. These results illustrate the utility of our unified framework for genome-wide association analysis of common, complex diseases.</p
Cubic exact solutions for the estimation of pairwise haplotype frequencies: implications for linkage disequilibrium analyses and a web tool 'CubeX'
<p>Abstract</p> <p>Background</p> <p>The frequency of a haplotype comprising one allele at each of two loci can be expressed as a cubic equation (the 'Hill equation'), the solution of which gives that frequency. Most haplotype and linkage disequilibrium analysis programs use iteration-based algorithms which substitute an estimate of haplotype frequency into the equation, producing a new estimate which is repeatedly fed back into the equation until the values converge to a maximum likelihood estimate (expectation-maximisation).</p> <p>Results</p> <p>We present a program, "CubeX", which calculates the biologically possible exact solution(s) and provides estimated haplotype frequencies, D', r<sup>2 </sup>and <it>χ</it><sup>2 </sup>values for each. CubeX provides a "complete" analysis of haplotype frequencies and linkage disequilibrium for a pair of biallelic markers under situations where sampling variation and genotyping errors distort sample Hardy-Weinberg equilibrium, potentially causing more than one biologically possible solution. We also present an analysis of simulations and real data using the algebraically exact solution, which indicates that under perfect sample Hardy-Weinberg equilibrium there is only one biologically possible solution, but that under other conditions there may be more.</p> <p>Conclusion</p> <p>Our analyses demonstrate that lower allele frequencies, lower sample numbers, population stratification and a possible |D'| value of 1 are particularly susceptible to distortion of sample Hardy-Weinberg equilibrium, which has significant implications for calculation of linkage disequilibrium in small sample sizes (eg HapMap) and rarer alleles (eg paucimorphisms, q < 0.05) that may have particular disease relevance and require improved approaches for meaningful evaluation.</p
Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach
Background - The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings - We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance - This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic ris
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Polymorphisms on SSC15q21-q26 Containing QTL for reproduction in Swine and its association with litter size
Several quantitative trait loci (QTL) for important reproductive traits (ovulation rate) have been identified on the porcine chromosome 15 (SSC15). To assist in the selection of positional candidate swine genes for these QTL on SSC15, twenty-one genes had already been assigned to SSC15 in a previous study in our lab, by using the radiation hybrid panel IMpRH. Further polymorphism studies were carried out on these positional candidate genes with four breeds of pigs (Duroc, Erhualian, Dahuabai and Landrace) harboring significant differences in reproduction traits. A total of nineteen polymorphisms were found in 21 genes. Among these, seven in six genes were used for association studies, whereby NRP2 polymorphism was found to be significantly (p < 0.05) associated with litter-size traits. NRP2 might be a candidate gene for pig-litter size based on its chromosome location (Du et al., 2006), significant association with litter-size traits and relationships with Sema and the VEGF super families
Polymorphisms of XRCC4 are involved in reduced colorectal cancer risk in Chinese schizophrenia patients
<p>Abstract</p> <p>Background</p> <p>Genetic factors related to the regulation of apoptosis in schizophrenia patients may be involved in a reduced vulnerability to cancer. XRCC4 is one of the potential candidate genes associated with schizophrenia which might induce colorectal cancer resistance.</p> <p>Methods</p> <p>To examine the genetic association between colorectal cancer and schizophrenia, we analyzed five SNPs (rs6452526, rs2662238, rs963248, rs35268, rs2386275) covering ~205.7 kb in the region of XRCC4.</p> <p>Results</p> <p>We observed that two of the five genetic polymorphisms showed statistically significant differences between 312 colorectal cancer subjects without schizophrenia and 270 schizophrenia subjects (rs6452536, p = 0.004, OR 0.61, 95% CI 0.44-0.86; rs35268, p = 0.028, OR 1.54, 95% CI 1.05-2.26). Moreover, the haplotype which combined all five markers was the most significant, giving a global <it>p </it>= 0.0005.</p> <p>Conclusions</p> <p>Our data firstly indicate that XRCC4 may be a potential protective gene towards schizophrenia, conferring reduced susceptibility to colorectal cancer in the Han Chinese population.</p
Scanning and filling : ultra-dense SNP genotyping combining genotyping-by-sequencing, SNP array and whole-genome resequencing data
Genotyping-by-sequencing (GBS) represents a highly cost-effective high-throughput genotyping
approach. By nature, however, GBS is subject to generating sizeable amounts of
missing data and these will need to be imputed for many downstream analyses. The extent
to which such missing data can be tolerated in calling SNPs has not been explored widely.
In this work, we first explore the use of imputation to fill in missing genotypes in GBS datasets.
Importantly, we use whole genome resequencing data to assess the accuracy of the
imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs
could be called when tolerating up to 80% missing data, a five-fold increase over the number
called when tolerating up to 20% missing data. At all levels of missing data examined
(between 20% and 80%), the resulting SNP datasets were of uniformly high accuracy (96–
98%). We then used imputation to combine complementary SNP datasets derived from
GBS and a SNP array (SoySNP50K). We thus produced an enhanced dataset of >100,000
SNPs and the genotypes at the previously untyped loci were again imputed with a high level
of accuracy (95%). Of the >4,000,000 SNPs identified through resequencing 23 accessions
(among the 301 used in the GBS analysis), 1.4 million tag SNPs were used as a reference
to impute this large set of SNPs on the entire panel of 301 accessions. These previously
untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP
dataset (GBS + SoySNP50K) to perform a GWAS on seed oil content within this collection
of soybean accessions. Both the number of significant marker-trait associations and the
peak significance levels were improved considerably using this enhanced catalog of SNPs
relative to a smaller catalog resulting from GBS alone at 20% missing data. Our results
demonstrate that imputation can be used to fill in both missing genotypes and untyped loci
with very high accuracy and that this leads to more powerful genetic analyses
- …