150 research outputs found

    Group level MEG/EEG source imaging via optimal transport: minimum Wasserstein estimates

    Get PDF
    International audienceMagnetoencephalography (MEG) and electroencephalogra-phy (EEG) are non-invasive modalities that measure the weak electromagnetic fields generated by neural activity. Inferring the location of the current sources that generated these magnetic fields is an ill-posed inverse problem known as source imaging. When considering a group study, a baseline approach consists in carrying out the estimation of these sources independently for each subject. The ill-posedness of each problem is typically addressed using sparsity promoting regularizations. A straightforward way to define a common pattern for these sources is then to average them. A more advanced alternative relies on a joint localization of sources for all subjects taken together, by enforcing some similarity across all estimated sources. An important advantage of this approach is that it consists in a single estimation in which all measurements are pooled together, making the inverse problem better posed. Such a joint estimation poses however a few challenges, notably the selection of a valid regularizer that can quantify such spatial similarities. We propose in this work a new procedure that can do so while taking into account the geometrical structure of the cortex. We call this procedure Minimum Wasserstein Estimates (MWE). The benefits of this model are twofold. First, joint inference allows to pool together the data of different brain geometries, accumulating more spatial information. Second, MWE are defined through Optimal Transport (OT) metrics which provide a tool to model spatial proximity between cortical sources of different subjects, hence not enforcing identical source location in the group. These benefits allow MWE to be more accurate than standard MEG source localization techniques. To support these claims, we perform source localization on realistic MEG simulations based on forward operators derived from MRI scans. On a visual task dataset, we demonstrate how MWE infer neural patterns similar to functional Magnetic Resonance Imaging (fMRI) maps

    Predicting Short-term MCI-to-AD Progression Using Imaging, CSF, Genetic Factors, Cognitive Resilience, and Demographics.

    Get PDF
    In the Alzheimer’s disease (AD) continuum, the prodromal state of mild cognitive impairment (MCI) precedes AD dementia and identifying MCI individuals at risk of progression is important for clinical management. Our goal was to develop generalizable multivariate models that integrate highdimensional data (multimodal neuroimaging and cerebrospinal fuid biomarkers, genetic factors, and measures of cognitive resilience) for identifcation of MCI individuals who progress to AD within 3 years. Our main fndings were i) we were able to build generalizable models with clinically relevant accuracy (~93%) for identifying MCI individuals who progress to AD within 3 years; ii) markers of AD pathophysiology (amyloid, tau, neuronal injury) accounted for large shares of the variance in predicting progression; iii) our methodology allowed us to discover that expression of CR1 (complement receptor 1), an AD susceptibility gene involved in immune pathways, uniquely added independent predictive value. This work highlights the value of optimized machine learning approaches for analyzing multimodal patient information for making predictive assessments

    Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials

    Get PDF
    INTRODUCTION: The Alzheimer's Disease Neuroimaging Initiative (ADNI) has continued development and standardization of methodologies for biomarkers and has provided an increased depth and breadth of data available to qualified researchers. This review summarizes the over 400 publications using ADNI data during 2014 and 2015. METHODS: We used standard searches to find publications using ADNI data. RESULTS: (1) Structural and functional changes, including subtle changes to hippocampal shape and texture, atrophy in areas outside of hippocampus, and disruption to functional networks, are detectable in presymptomatic subjects before hippocampal atrophy; (2) In subjects with abnormal β-amyloid deposition (Aβ+), biomarkers become abnormal in the order predicted by the amyloid cascade hypothesis; (3) Cognitive decline is more closely linked to tau than Aβ deposition; (4) Cerebrovascular risk factors may interact with Aβ to increase white-matter (WM) abnormalities which may accelerate Alzheimer's disease (AD) progression in conjunction with tau abnormalities; (5) Different patterns of atrophy are associated with impairment of memory and executive function and may underlie psychiatric symptoms; (6) Structural, functional, and metabolic network connectivities are disrupted as AD progresses. Models of prion-like spreading of Aβ pathology along WM tracts predict known patterns of cortical Aβ deposition and declines in glucose metabolism; (7) New AD risk and protective gene loci have been identified using biologically informed approaches; (8) Cognitively normal and mild cognitive impairment (MCI) subjects are heterogeneous and include groups typified not only by "classic" AD pathology but also by normal biomarkers, accelerated decline, and suspected non-Alzheimer's pathology; (9) Selection of subjects at risk of imminent decline on the basis of one or more pathologies improves the power of clinical trials; (10) Sensitivity of cognitive outcome measures to early changes in cognition has been improved and surrogate outcome measures using longitudinal structural magnetic resonance imaging may further reduce clinical trial cost and duration; (11) Advances in machine learning techniques such as neural networks have improved diagnostic and prognostic accuracy especially in challenges involving MCI subjects; and (12) Network connectivity measures and genetic variants show promise in multimodal classification and some classifiers using single modalities are rivaling multimodal classifiers. DISCUSSION: Taken together, these studies fundamentally deepen our understanding of AD progression and its underlying genetic basis, which in turn informs and improves clinical trial desig

    Multimodal Data Fusion and Quantitative Analysis for Medical Applications

    Get PDF
    Medical big data is not only enormous in its size, but also heterogeneous and complex in its data structure, which makes conventional systems or algorithms difficult to process. These heterogeneous medical data include imaging data (e.g., Positron Emission Tomography (PET), Computerized Tomography (CT), Magnetic Resonance Imaging (MRI)), and non-imaging data (e.g., laboratory biomarkers, electronic medical records, and hand-written doctor notes). Multimodal data fusion is an emerging vital field to address this urgent challenge, aiming to process and analyze the complex, diverse and heterogeneous multimodal data. The fusion algorithms bring great potential in medical data analysis, by 1) taking advantage of complementary information from different sources (such as functional-structural complementarity of PET/CT images) and 2) exploiting consensus information that reflects the intrinsic essence (such as the genetic essence underlying medical imaging and clinical symptoms). Thus, multimodal data fusion benefits a wide range of quantitative medical applications, including personalized patient care, more optimal medical operation plan, and preventive public health. Though there has been extensive research on computational approaches for multimodal fusion, there are three major challenges of multimodal data fusion in quantitative medical applications, which are summarized as feature-level fusion, information-level fusion and knowledge-level fusion: • Feature-level fusion. The first challenge is to mine multimodal biomarkers from high-dimensional small-sample multimodal medical datasets, which hinders the effective discovery of informative multimodal biomarkers. Specifically, efficient dimension reduction algorithms are required to alleviate "curse of dimensionality" problem and address the criteria for discovering interpretable, relevant, non-redundant and generalizable multimodal biomarkers. • Information-level fusion. The second challenge is to exploit and interpret inter-modal and intra-modal information for precise clinical decisions. Although radiomics and multi-branch deep learning have been used for implicit information fusion guided with supervision of the labels, there is a lack of methods to explicitly explore inter-modal relationships in medical applications. Unsupervised multimodal learning is able to mine inter-modal relationship as well as reduce the usage of labor-intensive data and explore potential undiscovered biomarkers; however, mining discriminative information without label supervision is an upcoming challenge. Furthermore, the interpretation of complex non-linear cross-modal associations, especially in deep multimodal learning, is another critical challenge in information-level fusion, which hinders the exploration of multimodal interaction in disease mechanism. • Knowledge-level fusion. The third challenge is quantitative knowledge distillation from multi-focus regions on medical imaging. Although characterizing imaging features from single lesions using either feature engineering or deep learning methods have been investigated in recent years, both methods neglect the importance of inter-region spatial relationships. Thus, a topological profiling tool for multi-focus regions is in high demand, which is yet missing in current feature engineering and deep learning methods. Furthermore, incorporating domain knowledge with distilled knowledge from multi-focus regions is another challenge in knowledge-level fusion. To address the three challenges in multimodal data fusion, this thesis provides a multi-level fusion framework for multimodal biomarker mining, multimodal deep learning, and knowledge distillation from multi-focus regions. Specifically, our major contributions in this thesis include: • To address the challenges in feature-level fusion, we propose an Integrative Multimodal Biomarker Mining framework to select interpretable, relevant, non-redundant and generalizable multimodal biomarkers from high-dimensional small-sample imaging and non-imaging data for diagnostic and prognostic applications. The feature selection criteria including representativeness, robustness, discriminability, and non-redundancy are exploited by consensus clustering, Wilcoxon filter, sequential forward selection, and correlation analysis, respectively. SHapley Additive exPlanations (SHAP) method and nomogram are employed to further enhance feature interpretability in machine learning models. • To address the challenges in information-level fusion, we propose an Interpretable Deep Correlational Fusion framework, based on canonical correlation analysis (CCA) for 1) cohesive multimodal fusion of medical imaging and non-imaging data, and 2) interpretation of complex non-linear cross-modal associations. Specifically, two novel loss functions are proposed to optimize the discovery of informative multimodal representations in both supervised and unsupervised deep learning, by jointly learning inter-modal consensus and intra-modal discriminative information. An interpretation module is proposed to decipher the complex non-linear cross-modal association by leveraging interpretation methods in both deep learning and multimodal consensus learning. • To address the challenges in knowledge-level fusion, we proposed a Dynamic Topological Analysis framework, based on persistent homology, for knowledge distillation from inter-connected multi-focus regions in medical imaging and incorporation of domain knowledge. Different from conventional feature engineering and deep learning, our DTA framework is able to explicitly quantify inter-region topological relationships, including global-level geometric structure and community-level clusters. K-simplex Community Graph is proposed to construct the dynamic community graph for representing community-level multi-scale graph structure. The constructed dynamic graph is subsequently tracked with a novel Decomposed Persistence algorithm. Domain knowledge is incorporated into the Adaptive Community Profile, summarizing the tracked multi-scale community topology with additional customizable clinically important factors

    Bayesian Approaches For Modeling Variation

    Get PDF
    A core focus of statistics is determining how much of the variation in data may be attributed to the signal of interest, and how much to noise. When the sources of variation are many and complex, a Bayesian approach to data analysis offers a number of advantages. In this thesis, we propose and implement new Bayesian methods for modeling variation in two general settings. The first setting is high-dimensional linear regression where the unknown error variance is also of interest. Here, we show that a commonly used class of conjugate shrinkage priors can lead to underestimation of the error variance. We then extend the Spike-and-Slab Lasso (SSL, Rockova and George, 2018) to the unknown variance case, using an alternative, independent prior framework. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on both simulated and real data. For the second setting, we move from univariate response data where the predictors are known, to multivariate response data in which potential predictors are unobserved. In this setting, we first consider the problem of biclustering, where a motivating example is to find subsets of genes which have similar expression in a subset of patients. For this task, we propose a new biclustering method called Spike-and-Slab Lasso Biclustering (SSLB). SSLB utilizes the SSL prior to find a doubly-sparse factorization of the data matrix via a fast EM algorithm. Applied to both a microarray dataset and a single-cell RNA-sequencing dataset, SSLB recovers biologically meaningful signal in the data. The second problem we consider in this setting is nonlinear factor analysis. The goal here is to find low-dimensional, unobserved ``factors\u27\u27 which drive the variation in the high-dimensional observed data in a potentially nonlinear fashion. For this purpose, we develop factor analysis BART (faBART), an MCMC algorithm which alternates sampling from the posterior of (a) the factors and (b) a functional approximation to the mapping from the factors to the data. The latter step utilizes Bayesian Additive Regression Trees (BART, Chipman et al., 2010). On a variety of simulation settings, we demonstrate that with only the observed data as the input, faBART is able to recover both the unobserved factors and the nonlinear mapping

    Parkinson's Disease Classification and Clinical Score Regression via United Embedding and Sparse Learning From Longitudinal Data

    Get PDF
    Parkinson's disease (PD) is known as an irreversible neurodegenerative disease that mainly affects the patient's motor system. Early classification and regression of PD are essential to slow down this degenerative process from its onset. In this article, a novel adaptive unsupervised feature selection approach is proposed by exploiting manifold learning from longitudinal multimodal data. Classification and clinical score prediction are performed jointly to facilitate early PD diagnosis. Specifically, the proposed approach performs united embedding and sparse regression, which can determine the similarity matrices and discriminative features adaptively. Meanwhile, we constrain the similarity matrix among subjects and exploit the l2,p norm to conduct sparse adaptive control for obtaining the intrinsic information of the multimodal data structure. An effective iterative optimization algorithm is proposed to solve this problem. We perform abundant experiments on the Parkinson's Progression Markers Initiative (PPMI) data set to verify the validity of the proposed approach. The results show that our approach boosts the performance on the classification and clinical score regression of longitudinal data and surpasses the state-of-the-art approaches

    Assisted Network Analysis in Cancer Genomics

    Get PDF
    Cancer is a molecular disease. In the past two decades, we have witnessed a surge of high- throughput profiling in cancer research and corresponding development of high-dimensional statistical techniques. In this dissertation, the focus is on gene expression, which has played a uniquely important role in cancer research. Compared to some other types of molecular measurements, for example DNA changes, gene expressions are “closer” to cancer outcomes. In addition, processed gene expression data have good statistical properties, in particular, continuity. In the “early” cancer gene expression data analysis, attention has been on marginal properties such as mean and variance. Genes function in a coordinated way. As such, techniques that take a system perspective have been developed to also take into account the interconnections among genes. Among such techniques, graphical models, with lucid biological interpretations and satisfactory statistical properties, have attracted special attention. Graphical model-based analysis can not only lead to a deeper understanding of genes’ properties but also serve as a basis for other analyses, for example, regression and clustering. Cancer molecular studies usually have limited sizes. In the graphical model- based analysis, the number of parameters to be estimated gets squared. Combined together, they lead to a serious lack of information.The overarching goal of this dissertation is to conduct more effective graphical model analysis for cancer gene expression studies. One literature review and three methodological projects have been conducted. The overall strategy is to borrow strength from additional information so as to assist gene expression graphical model estimation. In the first chapter, the literature review is conducted. The methods developed in Chapter 2 and Chapter 4 take advantage of information on regulators of gene expressions (such as methylation, copy number variation, microRNA, and others). As they belong to the vertical data integration framework, we first provide a review of such data integration for gene expression data in Chapter 1. Additional, graphical model-based analysis for gene expression data is reviewed. Research reported in this chapter has led to a paper published in Briefings in Bioinformat- ics. In Chapters 2-4, to accommodate the extreme complexity of information-borrowing for graphical models, three different approaches have been proposed. In Chapter 2, two graphical models, with a gene-expression-only one and a gene-expression-regulator one, are simultaneously considered. A biologically sensible hierarchy between the sparsity structures of these two networks is developed, which is the first of its kind. This hierarchy is then used to link the estimation of the two graphical models. This work has led to a paper published in Genetic Epidemiology. In Chapter 3, additional information is mined from published literature, for example, those deposited at PubMed. The consideration is that published studies have been based on many independent experiments and can contain valuable in- formation on genes’ interconnections. The challenge is to recognize that such information can be partial or even wrong. A two-step approach, consisting of information-guided and information-incorporated estimations, is developed. This work has led to a paper published in Biometrics. In Chapter 4, we slightly shift attention and examine the difference in graphs, which has important implications for understanding cancer development and progression. Our strategy is to link changes in gene expression graphs with those in regulator graphs, which means additional information for estimation. It is noted that to make individual chapters standing-alone, there can be minor overlapping in descriptions. All methodological developments in this research fit the advanced penalization paradigm, which has been popular for cancer gene expression and other molecular data analysis. This methodological coherence is highly desirable. For the methods described in Chapters 2- 4, we have developed new penalized estimations which have lucid interpretations and can directly lead to variable selection (and so sparse and interpretable graphs). We have also developed effective computational algorithms and R codes, which have been made publicly available at Dr. Shuangge Ma’s Github software repository. For the methods described in Chapters 2 and 3, statistical properties under ultrahigh dimensional settings and mild regularity conditions have been established, providing the proposed methods a uniquely strong ground. Statistical properties for the method developed in Chapter 4 are relatively straightforward and hence are omitted. For all the proposed methods, we have conducted extensive simulations, comparisons with the most relevant competitors, and data analysis. The practical advantage is fully established. Overall, this research has delivered a practically sensible information-incorporating strategy for improving graphical model-based analysis for cancer gene expression data, multiple highly competitive methods, R programs that can have broad utilization, and new findings for multiple cancer types

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    corecore