33 research outputs found
Associating multiple longitudinal traits with high-dimensional single-nucleotide polymorphism data: application to the Framingham Heart Study
Cardiovascular diseases are associated with combinations of phenotypic traits, which are in turn caused by a combination of environmental and genetic factors. Because of the diversity of pathways that may lead to cardiovascular diseases, we examined the so-called intermediate phenotypes, which are often repeatedly measured. We developed a penalized nonlinear canonical correlation analysis to associate multiple repeatedly measured traits with high-dimensional single-nucleotide polymorphism data
Association of repeatedly measured intermediate risk factors for complex diseases with high dimensional SNP data
BACKGROUND: The causes of complex diseases are difficult to grasp since many different factors play a role in their onset. To find a common genetic background, many of the existing studies divide their population into controls and cases; a classification that is likely to cause heterogeneity within the two groups. Rather than dividing the study population into cases and controls, it is better to identify the phenotype of a complex disease by a set of intermediate risk factors. But these risk factors often vary over time and are therefore repeatedly measured. RESULTS: We introduce a method to associate multiple repeatedly measured intermediate risk factors with a high dimensional set of single nucleotide polymorphisms (SNPs). Via a two-step approach, we summarized the time courses of each individual and, secondly apply these to penalized nonlinear canonical correlation analysis to obtain sparse results. CONCLUSIONS: Application of this method to two datasets which study the genetic background of cardiovascular diseases, show that compared to progression over time, mainly the constant levels in time are associated with sets of SNPs
Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks
<p>Abstract</p> <p>Background</p> <p>We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes.</p> <p>Results</p> <p>We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes.</p> <p>Conclusion</p> <p>We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.</p
Improved Heterosis Prediction by Combining Information on DNA- and Metabolic Markers
Background: Hybrids represent a cornerstone in the success story of breeding programs. The fundamental principle underlying this success is the phenomenon of hybrid vigour, or heterosis. It describes an advantage of the offspring as compared to the two parental lines with respect to parameters such as growth and resistance against abiotic or biotic stress. Dominance, overdominance or epistasis based models are commonly used explanations. Conclusion/Significance: The heterosis level is clearly a function of the combination of the parents used for offspring production. This results in a major challenge for plant breeders, as usually several thousand combinations of parents have to be tested for identifying the best combinations. Thus, any approach to reliably predict heterosis levels based on properties of the parental lines would be highly beneficial for plant breeding. Methodology/Principal Findings: Recently, genetic data have been used to predict heterosis. Here we show that a combination of parental genetic and metabolic markers, identified via feature selection and minimum-description-length based regression methods, significantly improves the prediction of biomass heterosis in resulting offspring. These findings will help furthering our understanding of the molecular basis of heterosis, revealing, for instance, the presence of nonlinear genotype-phenotype relationships. In addition, we describe a possible approach for accelerated selection in plant breeding
Correlating multiple SNPs and multiple disease phenotypes: Penalized nonlinear canonical correlation analysis
MOTIVATION: Canonical correlation analysis (CCA) can be used to capture the underlying genetic background of a complex disease, by associating two datasets containing information about a patient's phenotypical and genetic details. Often the genetic information is measured on a qualitative scale, consequently ordinary CCA can not be applied to such data. Moreover, the size of the data in genetic studies can be enormous thereby making the results difficult to interpret. RESULTS: We developed a penalized nonlinear canonical correlation analysis approach that can deal with qualitative data by transforming each qualitative variable into a continuous variable via optimal scaling. Additionally sparse results were obtained by adapting softthresholding to this nonlinear version of the CCA. By means of simulation studies we show that our method is capable of extracting relevant variables out of high dimensional sets. We applied our method to a genetic dataset containing 144 patients with glial cancer. CONTACT: [email protected]
Association of repeatedly measured intermediate risk factors for complex diseases with high dimensional SNP data
Abstract Background The causes of complex diseases are difficult to grasp since many different factors play a role in their onset. To find a common genetic background, many of the existing studies divide their population into controls and cases; a classification that is likely to cause heterogeneity within the two groups. Rather than dividing the study population into cases and controls, it is better to identify the phenotype of a complex disease by a set of intermediate risk factors. But these risk factors often vary over time and are therefore repeatedly measured. Results We introduce a method to associate multiple repeatedly measured intermediate risk factors with a high dimensional set of single nucleotide polymorphisms (SNPs). Via a two-step approach, we summarized the time courses of each individual and, secondly apply these to penalized nonlinear canonical correlation analysis to obtain sparse results. Conclusions Application of this method to two datasets which study the genetic background of cardiovascular diseases, show that compared to progression over time, mainly the constant levels in time are associated with sets of SNPs.</p