41 research outputs found

    Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes.</p> <p>Results</p> <p>We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes.</p> <p>Conclusion</p> <p>We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.</p

    Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

    Get PDF
    Supplementary Figure 1. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0.9, ty = 0.3 (left panel), and PCA+CCA (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by two markers. The filled markers represent the coordinates in the features extracted from the copy number variables, and the open markers represent coordinates in the features extracted from the gene expression variables. Samples with different leukemia subtypes are shown with different colors. The first feature pair distinguishes the HD50 group from the rest, while the second feature pair represents the characteristics of the samples from the E2A/PBX1 subtype. The high canonical correlation obtained for the tuning samples with regularized dual CCA is apparent in the left panel, where the two points for each sample coincide. Nevertheless, the extracted features have a high generalization ability, as can be seen in the left panel of Figure 5, showing the representation of the validation samples. 1 Supplementary Figure 2. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0, ty = 0 (left panel), and tx = 1, ty = 1 (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by tw

    Improved Heterosis Prediction by Combining Information on DNA- and Metabolic Markers

    Get PDF
    Background: Hybrids represent a cornerstone in the success story of breeding programs. The fundamental principle underlying this success is the phenomenon of hybrid vigour, or heterosis. It describes an advantage of the offspring as compared to the two parental lines with respect to parameters such as growth and resistance against abiotic or biotic stress. Dominance, overdominance or epistasis based models are commonly used explanations. Conclusion/Significance: The heterosis level is clearly a function of the combination of the parents used for offspring production. This results in a major challenge for plant breeders, as usually several thousand combinations of parents have to be tested for identifying the best combinations. Thus, any approach to reliably predict heterosis levels based on properties of the parental lines would be highly beneficial for plant breeding. Methodology/Principal Findings: Recently, genetic data have been used to predict heterosis. Here we show that a combination of parental genetic and metabolic markers, identified via feature selection and minimum-description-length based regression methods, significantly improves the prediction of biomass heterosis in resulting offspring. These findings will help furthering our understanding of the molecular basis of heterosis, revealing, for instance, the presence of nonlinear genotype-phenotype relationships. In addition, we describe a possible approach for accelerated selection in plant breeding
    corecore