Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

A Andersson; A Kallioniemi; B Stranger; BA Walker; C Schoch; CG Mullighan; Charlotte Soneson; DM Witten; DR Hardoon; DR Hardoon; E Hyman; E Parkhomenko; F Mitelman; H Hotelling; H Hotelling; H Wold; H Zou; HD Vinod; Henrik Lilljebjörn; I González; I González; J Peng; J Shawe-Taylor; J Thioulouse; JH Friedman; JR Pollack; K Paulsson; KA Lê Cao; KA Lê Cao; KE Muller; KJ Bussey; M Lin; Magnus Fontes; ME Ross; MR Stratton; P Platzer; RC Gentleman; RJ Tibshirani; S Akaho; S Bungaro; S Monni; S Waaijenborg; SB Baylin; SE Leurgans; T Melzer; Thoas Fioretos; WR Dillon; Z Wu

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

Abstract

Supplementary Figure 1. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0.9, ty = 0.3 (left panel), and PCA+CCA (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by two markers. The filled markers represent the coordinates in the features extracted from the copy number variables, and the open markers represent coordinates in the features extracted from the gene expression variables. Samples with different leukemia subtypes are shown with different colors. The first feature pair distinguishes the HD50 group from the rest, while the second feature pair represents the characteristics of the samples from the E2A/PBX1 subtype. The high canonical correlation obtained for the tuning samples with regularized dual CCA is apparent in the left panel, where the two points for each sample coincide. Nevertheless, the extracted features have a high generalization ability, as can be seen in the left panel of Figure 5, showing the representation of the validation samples. 1 Supplementary Figure 2. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0, ty = 0 (left panel), and tx = 1, ty = 1 (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by tw