2,887 research outputs found
Sparse integrative clustering of multiple omics data sets
High resolution microarrays and second-generation sequencing platforms are
powerful tools to investigate genome-wide alterations in DNA copy number,
methylation and gene expression associated with a disease. An integrated
genomic profiling approach measures multiple omics data types simultaneously in
the same set of biological samples. Such approach renders an integrated data
resolution that would not be available with any single data type. In this
study, we use penalized latent variable regression methods for joint modeling
of multiple omics data types to identify common latent variables that can be
used to cluster patient samples into biologically and clinically relevant
disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996)
267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005)
301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005)
91-108] methods to induce sparsity in the coefficient vectors, revealing
important genomic features that have significant contributions to the latent
variables. An iterative ridge regression is used to compute the sparse
coefficient vectors. In model selection, a uniform design [Monographs on
Statistics and Applied Probability (1994) Chapman & Hall] is used to seek
"experimental" points that scattered uniformly across the search domain for
efficient sampling of tuning parameter combinations. We compared our method to
sparse singular value decomposition (SVD) and penalized Gaussian mixture model
(GMM) using both real and simulated data sets. The proposed method is applied
to integrate genomic, epigenomic and transcriptomic data for subtype analysis
in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recommended from our members
Broad and thematic remodeling of the surfaceome and glycoproteome on isogenic cells transformed with driving proliferative oncogenes.
The cell surface proteome, the surfaceome, is the interface for engaging the extracellular space in normal and cancer cells. Here we apply quantitative proteomics of N-linked glycoproteins to reveal how a collection of some 700 surface proteins is dramatically remodeled in an isogenic breast epithelial cell line stably expressing any of six of the most prominent proliferative oncogenes, including the receptor tyrosine kinases, EGFR and HER2, and downstream signaling partners such as KRAS, BRAF, MEK, and AKT. We find that each oncogene has somewhat different surfaceomes, but the functions of these proteins are harmonized by common biological themes including up-regulation of nutrient transporters, down-regulation of adhesion molecules and tumor suppressing phosphatases, and alteration in immune modulators. Addition of a potent MEK inhibitor that blocks MAPK signaling brings each oncogene-induced surfaceome back to a common state reflecting the strong dependence of the oncogene on the MAPK pathway to propagate signaling. Cell surface protein capture is mediated by covalent tagging of surface glycans, yet current methods do not afford sequencing of intact glycopeptides. Thus, we complement the surfaceome data with whole cell glycoproteomics enabled by a recently developed technique called activated ion electron transfer dissociation (AI-ETD). We found massive oncogene-induced changes to the glycoproteome and differential increases in complex hybrid glycans, especially for KRAS and HER2 oncogenes. Overall, these studies provide a broad systems-level view of how specific driver oncogenes remodel the surfaceome and the glycoproteome in a cell autologous fashion, and suggest possible surface targets, and combinations thereof, for drug and biomarker discovery
Ants constructing rule-based classifiers.
Classifiers; Data; Data mining; Studies;
- ā¦