58 research outputs found
AssocTests: An R Package for Genetic Association Studies
The R package AssocTests provides some procedures which are commonly used in genetic association studies. These procedures are population stratification correction through eigenvectors, principal coordinates of clusterings, Tracy-Widom test, distance regression, single-marker test, maximum test based on three Cochran-Armitage trend tests, non-parametric trend test, and non-parametric maximum test. The trait values for these methods should be discrete or continuous. The discrete traits can be coded by 1/0 for cases/controls. The genotype values can be 0, 1, or 2 indicating the number of risk alleles for a biallelic single-nucleotide polymorphism. This article introduces the methods and algorithms implemented in the package. Some examples are provided to illustrate the package's capability
On Coalescence Analysis Using Genealogy Rooted Trees
DNA sequence data are now being used to study the ancestral history of human population. The existing methods for such coalescence inference use recursion formula to compute the data probabilities. These methods are useful in practical applications, but computationally complicated. Here we first investigate the asymptotic behavior of such inference; results indicate that, broadly, the estimated coalescent time will be consistent to a finite limit. Then we study a relatively simple computation method for this analysis and illustrate how to use it
Nonparametric Prediction Distribution from Resolution-Wise Regression with Heterogeneous Data
Modeling and inference for heterogeneous data have gained great interest recently due to rapid developments in personalized marketing. Most existing regression approaches are based on the conditional mean and may require additional cluster information to accommodate data heterogeneity. In this paper, we propose a novel nonparametric resolution-wise regression procedure to provide an estimated distribution of the response instead of one single value. We achieve this by decomposing the information of the response and the predictors into resolutions and patterns respectively based on marginal binary expansions. The relationships between resolutions and patterns are modeled by penalized logistic regressions. Combining the resolution-wise prediction, we deliver a histogram of the conditional response to approximate the distribution. Moreover, we show a sure independence screening property and the consistency of the proposed method for growing dimensions. Simulations and a real estate valuation dataset further illustrate the effectiveness of the proposed method
Robust joint analysis allowing for model uncertainty in two-stage genetic association studies
<p>Abstract</p> <p>Background</p> <p>The cost efficient two-stage design is often used in genome-wide association studies (GWASs) in searching for genetic loci underlying the susceptibility for complex diseases. Replication-based analysis, which considers data from each stage separately, often suffers from loss of efficiency. Joint test that combines data from both stages has been proposed and widely used to improve efficiency. However, existing joint analyses are based on test statistics derived under an assumed genetic model, and thus might not have robust performance when the assumed genetic model is not appropriate.</p> <p>Results</p> <p>In this paper, we propose joint analyses based on two robust tests, MERT and MAX3, for GWASs under a two-stage design. We developed computationally efficient procedures and formulas for significant level evaluation and power calculation. The performances of the proposed approaches are investigated through the extensive simulation studies and a real example. Numerical results show that the joint analysis based on the MAX3 test statistic has the best overall performance.</p> <p>Conclusions</p> <p>MAX3 joint analysis is the most robust procedure among the considered joint analyses, and we recommend using it in a two-stage genome-wide association study.</p
Identifying rheumatoid arthritis susceptibility genes using high-dimensional methods
Although several genes (including a strong effect in the human leukocyte antigen (HLA) region) and some environmental factors have been implicated to cause susceptibility to rheumatoid arthritis (RA), the etiology of the disease is not completely understood. The ability to screen the entire genome for association to complex diseases has great potential for identifying gene effects. However, the efficiency of gene detection in this situation may be improved by methods specifically designed for high-dimensional data. The aim of this study was to compare how three different statistical approaches, multifactor dimensionality reduction (MDR), random forests (RF), and an omnibus approach, worked in identifying gene effects (including gene-gene interaction) associated with RA. We developed a test set of genes based on previous linkage and association findings and tested all three methods. In the presence of the HLA shared-epitope factor, other genes showed weaker effects. All three methods detected SNPs in PTPN22 and TRAF1-C5 as being important. But we did not detect any new genes in this study. We conclude that the three high-dimensional methods are useful as an initial screening for gene associations to identify promising genes for further modeling and additional replication studies
A regulatory insertion-deletion polymorphism in the FADS gene cluster influences PUFA and lipid profiles among Chinese adults: a population-based study
Background
Arachidonic acid (AA) is the major polyunsaturated fatty acid (PUFA) substrate for potent eicosanoid signaling to modulate inflammation and thrombosis and is controlled in part by tissue abundance. Fatty acid desaturase 1 (FADS1) catalyzes synthesis of omega-6 (n–3) AA and n–3 eicosapentaenoic acid (EPA). The rs66698963 polymorphism, a 22-base pair (bp) insertion-deletion 137 bp downstream of a sterol regulatory element in FADS2 intron 1, mediates expression of FADS1 in vitro, as well as exerting positive selection in several human populations. The associations between the polymorphism rs66698963 and plasma PUFAs as well as disease phenotypes are unclear.
Objective
This study aimed to evaluate the relation between rs66698963 genotypes and plasma PUFA concentrations and blood lipid profiles.
Design
Plasma fatty acids were measured from a single sample obtained at baseline in 1504 healthy Chinese adults aged between 35 and 59 y with the use of gas chromatography. Blood lipids were measured at baseline and a second time at the 18-mo follow-up. The rs66698963 genotype was determined by using agarose gel electrophoresis. Linear regression and logistic regression analyses were performed to assess the association between genotype and plasma PUFAs and blood lipids.
Results
A shift from the precursors linoleic acid and α-linolenic acid to produce AA and EPA, respectively, was observed, consistent with FADS1 activity increasing in the order of genotypes D/D to I/D to I/I. For I/I compared with D/D carriers, plasma concentrations of n–6 AA and the ratio of AA to n–3 EPA plus docosahexaenoic acid (DHA) were 57% and 32% higher, respectively. Carriers of the deletion (D) allele of rs66698963 tended to have higher triglycerides (β = 0.018; SE: 0.009; P = 0.05) and lower HDL cholesterol (β = −0.008; SE: 0.004; P = 0.02) than carriers of the insertion (I) allele.
Conclusions
The rs66698963 genotype is significantly associated with AA concentrations and AA to EPA+DHA ratio, reflecting basal risk of inflammatory and related chronic disease phenotypes, and is correlated with the risk of dyslipidemia
- …