57 research outputs found

    A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data

    Full text link
    Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore multiple levels of representations of genetic variants, learn their internal patterns involved in the disease development, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new framework referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the nine competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and nine other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the nine other statistics.Comment: 64 pages including 12 figure

    Genotype-Based Bayesian Analysis of Gene-Environment Interactions with Multiple Genetic Markers and Misclassification in Environmental Factors

    Get PDF
    A key component to understanding etiology of complex diseases, such as cancer, diabetes, alcohol dependence, is to investigate gene-environment interactions. This work is motivated by the following two concerns in the analysis of gene-environment interactions. First, multiple genetic markers in moderate linkage disequilibrium may be involved in susceptibility to a complex disease. Second, environmental factors may be subject to misclassification. We develop a genotype based Bayesian pseudolikelihood approach that accommodates linkage disequilibrium in genetic markers and misclassification in environmental factors. Since our approach is genotype based, it allows the observed genetic information to enter the model directly thus eliminating the need to infer haplotype phase and simplifying computations. Bayesian approach allows shrinking parameter estimates towards prior distribution to improve estimation and inference when environmental factors are subject to misclassification. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a case-control study of interaction between early onset of drinking and genes involved in dopamine pathway

    A genome-wide association scan for rheumatoid arthritis data by Hotelling's T2 tests

    Get PDF
    We performed a genome-wide association scan on the North American Rheumatoid Arthritis Consortium (NARAC) data using Hotelling's T2 tests, i.e., TH based on allele coding and TG based on genotype coding. The objective was to identify associations between single-nucleotide polymorphisms (SNPs) or markers and rheumatoid arthritis. In specific candidate gene regions, we evaluated the performance of Hotelling's T2 tests. Then Hotelling's T2 tests were used as a tool to identify new regions that contain SNPs showing strong associations with disease. As expected, the strongest association evidence was found in the region of the HLA-DRB1 locus on chromosome 6. In the region of the TRAF1-C5 genes, we identified two SNPs, rs2900180 and rs3761847, with the largest and the second largest TH and TG scores among all SNPs on chromosome 9. We also identified one SNP, rs2476601, in the region of the PTPN22 gene that had the largest TH score and the second largest TG score among all SNPs on chromosome 1. In addition, SNPs with the largest TH score on each chromosome were identified. These SNPs may be located in the regions of genes that have modest effects on rheumatoid arthritis. These regions deserve further investigation

    NEW EFFECTIVE TRANSFORMATIONAL COMPUTATIONAL METHODS

    Full text link
    Mathematics serves as a fundamental intelligent theoretic basis for computation, and mathematical analysis is very useful to develop computational methods to solve various problems in science and engineering. Integral transforms such as Laplace Transform have been playing an important role in computational methods. In this paper, we will introduce Sumudu Transform in a new computational approach, in which effective computational methods will be developed and implemented. Such computational methods are straightforward to understand, but powerful to incorporate into computational science to solve different problems automatically. We will provide computational analysis and essentiality by surveying and summarizing some related recent works, with additional automatic proof details by applying system built-in functions. Applications include the computation of coefficients of Taylor\u27s expansions, calculation of generating functions, mathematical identity proofs, solving differential equations and integral equations. For demonstration purposes, some of the methods were implemented in Maple with demonstrational results matching the expected values

    Association analysis of complex diseases using triads, parent-child dyads and singleton monads

    Get PDF
    Background: Triad families are routinely used to test association between genetic variants and complex diseases. Triad studies are important and popular since they are robust in terms of being less prone to false positives due to population structure. In practice, one may collect not only complete triads, but also incomplete families such as dyads (affected child with one parent) and singleton monads (affected child without parents). Since there is a lack of convenient algorithms and software to analyze the incomplete data, dyads and monads are usually discarded. This may lead to loss of power and insufficient utilization of genetic information in a study. Results: We develop likelihood-based statistical models and likelihood ratio tests to test for association between complex diseases and genetic markers by using combinations of full triads, parent-child dyads, and affected singleton monads for a unified analysis. A likelihood is calculated directly to facilitate the data analysis without imputation and to avoid computational complexity. This makes it easy to implement the models and to explain the results. Conclusion: By simulation studies, we show that the proposed models and tests are very robust in terms of accurately controlling type I error evaluations, and are powerful by empirical power evaluations. The methods are applied to test for association between transforming growth factor alpha (TGFA) gene and cleft palate in an Irish study

    Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models

    Get PDF
    Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies

    Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models

    Get PDF
    We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; i.e., genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao’s efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao’s efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [i.e., minor allele frequencies (MAF) < 0.03], the Rao’s efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao’s efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (P < 3.10 × 10−6), tentative association for 2 genes (HHEX and HMGA2; P ≈ 10−5), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene HHEX. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies

    A Functional Data Analysis Approach for Circadian Patterns of Activity of Teenage Girls

    Get PDF
    Background: Longitudinal or time-dependent activity data are useful to characterize the circadian activity patterns and to identify physical activity differences among multiple samples. Statistical methods designed to analyze multiple activity sample data are desired, and related software is needed to perform data analysis. Methods: This paper introduces a functional data analysis (fda) approach to perform a functional analysis of variance (fANOVA) for longitudinal circadian activity count data and to investigate the association of covariates such as weight or body mass index (BMI) on physical activity. For multiple age group adolescent school girls, the fANOVA approach is developed to study and to characterize activity patterns. The fANOVA is applied to analyze the physical activity data of three grade adolescent girls (i.e., grades 10, 11, and 12) from the NEXT Generation Health Study 2009–2013. To test if there are activity differences among girls of the three grades, a functional version of the univariate F-statistic is used to analyze the data. To investigate if there is a longitudinal (or time-dependent activity count) difference between two samples, functional t-tests are utilized to test: (1) activity differences between grade pairs; (2) activity differences between low-BMI girls and high-BMI girls of the NEXT study. Results: Statistically significant differences existed among the physical activity patterns for adolescent school girls in different grades. Girls in grade 10 tended to be less active than girls in grades 11 &amp; 12 between 5:30 and 9:30. Significant differences in physical activity were detected between low-BMI and high-BMI groups from 8:00 to 11:30 for grade 10 girls, and low-BMI group girls in grade 10 tended to be more active. Conclusions: The fda approach is useful in characterizing time-dependent patterns of actigraphy data. For two-sample data defined by weight or BMI values, fda can identify differences between the two time-dependent samples of activity data. Similarly, fda can identify differences among multiple physical activity time-dependent datasets. These analyses can be performed readily using the fda R program

    A comparison study of multivariate fixed models and Gene Association with Multiple Traits (GAMuT) for nextâ generation sequencing

    Full text link
    In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate Fâ distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of highâ dimensional genotype data. It is shown that approximate Fâ distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models that perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135654/1/gepi22014_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135654/2/gepi22014-sup0001-Suppmat.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135654/3/gepi22014.pd

    Rationale, design, and method of the Diabetes & Women’s Health study – a study of long-term health implications of glucose intolerance in pregnancy and their determinants

    Full text link
    Women who develop gestational diabetes mellitus or impaired glucose tolerance during pregnancy are at substantially increased risk for type 2 diabetes and comorbidities after pregnancy. Little is known about the role of genetic factors and their interactions with environmental factors in determining the transition from gestational diabetes mellitus to overt type 2 diabetes mellitus. These critical data gaps served as the impetus for this Diabetes & Women’s Health study with the overall goal of investigating genetic factors and their interactions with risk factors amenable to clinical or public health interventions in relation to the transition of gestational diabetes mellitus to type 2 diabetes mellitus. To achieve the goal efficiently, we are applying a hybrid design enrolling and collecting data longitudinally from approximately 4000 women with a medical history of gestational diabetes mellitus in two existing prospective cohorts, the Nurses’ Health Study II and the Danish National Birth Cohort. Women who had a medical history of gestational diabetes mellitus in one or more of their pregnancies are eligible for the present study. After enrollment, we follow study participants for an additional 2 years to collect updated information on major clinical and environmental factors that may predict type 2 diabetes mellitus risk as well as with biospecimens to measure genetic and biochemical markers implicated in glucose metabolism. Newly collected data will be appended to the relevant existing data for the creation of a new database inclusive of genetic, epigenetic and environmental data. Findings from the study are critical for the development of targeted and more effective strategies to prevent type 2 diabetes mellitus and its complications in this high-risk population
    • …
    corecore