4 research outputs found

    Any-way and Sparse Analyses for Multimodal Fusion and Imaging Genomics

    Get PDF
    This dissertation aims to develop new algorithms that leverage sparsity and mutual information across data modalities built upon the independent component analysis (ICA) framework to improve the performance of current ICA-based multimodal fusion approaches. These algorithms are further applied to both simulated data and real neuroimaging and genomic data to examine their performance. The identified neuroimaging and genomic patterns can help better delineate the pathology of mental disorders or brain development. To alleviate the signal-background separation difficulties in infomax-decomposed sources for genomic data, we propose a sparse infomax by enhancing a robust sparsity measure, the Hoyer index. Hoyer index is scale-invariant and well suited for ICA frameworks since the scale of decomposed sources is arbitrary. Simulation results demonstrate that sparse infomax increases the component detection accuracy for situations where the source signal-to-background (SBR) ratio is low, particularly for single nucleotide polymorphism (SNP) data. The proposed sparse infomax is further extended into two data modalities as a sparse parallel ICA for applications to imaging genomics in order to investigate the associations between brain imaging and genomics. Simulation results show that sparse parallel ICA outperforms parallel ICA with improved accuracy for structural magnetic resonance imaging (sMRI)-SNP association detection and component spatial map recovery, as well as with enhanced sparsity for sMRI and SNP components under noisy cases. Applying the proposed sparse parallel ICA to fuse the whole-brain sMRI and whole-genome SNP data of 24985 participants in the UK biobank, we identify three stable and replicable sMRI-SNP pairs. The identified sMRI components highlight frontal, parietal, and temporal regions and associate with multiple cognitive measures (with different association strengths in different age groups for the temporal component). Top SNPs in the identified SNP factor are enriched in inflammatory disease and inflammatory response pathways, which also regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region, and the regulation effects are significantly enriched. Applying the proposed sparse parallel ICA to imaging genomics in attention-deficit/hyperactivity disorder (ADHD), we identify and replicate one SNP component related to gray matter volume (GMV) alterations in superior and middle frontal gyri underlying working memory deficit in adults and adolescents with ADHD. The association is more significant in ADHD families than controls and stronger in adults and older adolescents than younger ones. The identified SNP component highlights SNPs in long non-coding RNAs (lncRNAs) in chromosome 5 and in several protein-coding genes that are involved in ADHD, such as MEF2C, CADM2, and CADPS2. Top SNPs are enriched in human brain neuron cells and regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region. Moreover, to increase the flexibility and robustness in mining multimodal data, we propose aNy-way ICA, which optimizes the entire correlation structure of linked components across any number of modalities via the Gaussian independent vector analysis and simultaneously optimizes independence via separate (parallel) ICAs. Simulation results demonstrate that aNy-way ICA recover sources and loadings, as well as the true covariance patterns with improved accuracy compared to existing multimodal fusion approaches, especially under noisy conditions. Applying the proposed aNy-way ICA to integrate structural MRI, fractal n-back, and emotion identification task functional MRIs collected in the Philadelphia Neurodevelopmental Cohort (PNC), we identify and replicate one linked GMV-threat-2-back component, and the threat and 2-back components are related to intelligence quotient (IQ) score in both discovery and replication samples. Lastly, we extend the proposed aNy-way ICA with a reference constraint to enable prior-guided multimodal fusion. Simulation results show that aNy-way ICA with reference recovers the designed linkages between reference and modalities, cross-modality correlations, as well as loading and component matrices with improved accuracy compared to multi-site canonical correlation analysis with reference (MCCAR)+joint ICA under noisy conditions. Applying aNy-way ICA with reference to supervise structural MRI, fractal n-back, and emotion identification task functional MRIs fusion in PNC with IQ as the reference, we identify and replicate one IQ-related GMV-threat-2-back component, and this component is significantly correlated across modalities in both discovery and replication samples.Ph.D

    Novel Semi-Supervised Learning Models to Balance Data Inclusivity and Usability in Healthcare Applications

    Get PDF
    abstract: Semi-supervised learning (SSL) is sub-field of statistical machine learning that is useful for problems that involve having only a few labeled instances with predictor (X) and target (Y) information, and abundance of unlabeled instances that only have predictor (X) information. SSL harnesses the target information available in the limited labeled data, as well as the information in the abundant unlabeled data to build strong predictive models. However, not all the included information is useful. For example, some features may correspond to noise and including them will hurt the predictive model performance. Additionally, some instances may not be as relevant to model building and their inclusion will increase training time and potentially hurt the model performance. The objective of this research is to develop novel SSL models to balance data inclusivity and usability. My dissertation research focuses on applications of SSL in healthcare, driven by problems in brain cancer radiomics, migraine imaging, and Parkinson’s Disease telemonitoring. The first topic introduces an integration of machine learning (ML) and a mechanistic model (PI) to develop an SSL model applied to predicting cell density of glioblastoma brain cancer using multi-parametric medical images. The proposed ML-PI hybrid model integrates imaging information from unbiopsied regions of the brain as well as underlying biological knowledge from the mechanistic model to predict spatial tumor density in the brain. The second topic develops a multi-modality imaging-based diagnostic decision support system (MMI-DDS). MMI-DDS consists of modality-wise principal components analysis to incorporate imaging features at different aggregation levels (e.g., voxel-wise, connectivity-based, etc.), a constrained particle swarm optimization (cPSO) feature selection algorithm, and a clinical utility engine that utilizes inverse operators on chosen principal components for white-box classification models. The final topic develops a new SSL regression model with integrated feature and instance selection called s2SSL (with “s2” referring to selection in two different ways: feature and instance). s2SSL integrates cPSO feature selection and graph-based instance selection to simultaneously choose the optimal features and instances and build accurate models for continuous prediction. s2SSL was applied to smartphone-based telemonitoring of Parkinson’s Disease patients.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201
    corecore