3 research outputs found

    Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets.

    Get PDF
    Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets

    Clinical Trajectory Analysis With Longitudinal Validation in COPD: A COPDGene Study

    Full text link
    RATIONALE: Chronic obstructive pulmonary disease (COPD) is heterogeneous in its clinical phenotypes (e.g. chronic bronchitis, emphysema) and trajectories of disease progression. Analysis of large high-dimensional datasets presents a key opportunity to address the gap in our understanding of COPD phenotypes and progression. Clinical trajectory analysis (ClinTrajAn), based on the concept of the branching principal tree, simultaneously phenotypes and determines patient trajectories within cross-sectional clinical data. Our aim was to apply ClinTrajAn to map prominent subtypes and trajectories in a large population of participants, covering the whole range of COPD severity and at-risk profiles, and validate proposed trajectories using longitudinal data. METHODS: Cross-sectional data for 8972 participants from Phase 1 of the COPDGene longitudinal study were utilized for model training, with 4585/8972 (51%) of participants having Phase 2 data (∼5 years later). Participants included current and former smokers with COPD (GOLD 1-4), normal spirometry (GOLD 0), and preserved ratio impaired spirometry (PRISm). 30 features were selected for training, covering demographics, exposure, pulmonary function, and CT imaging. The Phase 1 data matrix (8972x30) contained 2302 missing values (< 1%), which were imputed via single value decomposition (SVD). Principal component analysis (PCA) was applied to this completed matrix to reduce dimensionality to the first six principal components. A bifurcating principal tree fitting this reduced data was computed by averaging over 100 iteratively grown trees fitting random 95% samples. Longitudinal displacement was determined via projection of SVD imputed Phase 2 data using Phase 1 PCA results. RESULTS: The averaged tree contained six terminal segments and two notable bridging segments (Figure 1 A). Terminal segments divided emphysema dominant COPD by sex, identified mild-to severe COPD participants with bronchodilator reversibility (BDR), chronic bronchitis dominance, healthy aged participants, and PRISm dominance. Bridging segments divided healthy aged and PRISm participants from COPD, and mild COPD or chronic bronchitis from severe COPD or participants with COPD and BDR. Trajectories were defined as paths starting from a root among GOLD 0 participants. Longitudinal analysis showed most participants (69%) stayed on the same segment after 5 years, with segment displacements on average moving away from the root, and a notable increase in displacement for cases with accelerated decline leading to a COPD subtype or PRISm terminal (Figure 1 B). CONCLUSIONS: We have applied ClinTrajAn in a large longitudinal study population to model phenotypes and trajectories in COPD, and validated prediction of progression pathways through observation of projected displacements over 5 years. </p

    Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM.

    Get PDF
    Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell technologies. We further demonstrate its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms. STREAM is an open-source software package
    corecore