99 research outputs found

    Continuation of Nesterov's Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging

    Full text link
    Predictive models can be used on high-dimensional brain images for diagnosis of a clinical condition. Spatial regularization through structured sparsity offers new perspectives in this context and reduces the risk of overfitting the model while providing interpretable neuroimaging signatures by forcing the solution to adhere to domain-specific constraints. Total Variation (TV) enforces spatial smoothness of the solution while segmenting predictive regions from the background. We consider the problem of minimizing the sum of a smooth convex loss, a non-smooth convex penalty (whose proximal operator is known) and a wide range of possible complex, non-smooth convex structured penalties such as TV or overlapping group Lasso. Existing solvers are either limited in the functions they can minimize or in their practical capacity to scale to high-dimensional imaging data. Nesterov's smoothing technique can be used to minimize a large number of non-smooth convex structured penalties but reasonable precision requires a small smoothing parameter, which slows down the convergence speed. To benefit from the versatility of Nesterov's smoothing technique, we propose a first order continuation algorithm, CONESTA, which automatically generates a sequence of decreasing smoothing parameters. The generated sequence maintains the optimal convergence speed towards any globally desired precision. Our main contributions are: To propose an expression of the duality gap to probe the current distance to the global optimum in order to adapt the smoothing parameter and the convergence speed. We provide a convergence rate, which is an improvement over classical proximal gradient smoothing methods. We demonstrate on both simulated and high-dimensional structural neuroimaging data that CONESTA significantly outperforms many state-of-the-art solvers in regard to convergence speed and precision.Comment: 11 pages, 6 figures, accepted in IEEE TMI, IEEE Transactions on Medical Imaging 201

    SepVAE: a contrastive VAE to separate pathological patterns from healthy ones

    Full text link
    Contrastive Analysis VAE (CA-VAEs) is a family of Variational auto-encoders (VAEs) that aims at separating the common factors of variation between a background dataset (BG) (i.e., healthy subjects) and a target dataset (TG) (i.e., patients) from the ones that only exist in the target dataset. To do so, these methods separate the latent space into a set of salient features (i.e., proper to the target dataset) and a set of common features (i.e., exist in both datasets). Currently, all models fail to prevent the sharing of information between latent spaces effectively and to capture all salient factors of variation. To this end, we introduce two crucial regularization losses: a disentangling term between common and salient representations and a classification term between background and target samples in the salient space. We show a better performance than previous CA-VAEs methods on three medical applications and a natural images dataset (CelebA). Code and datasets are available on GitHub https://github.com/neurospin-projects/2023_rlouiset_sepvae.Comment: Workshop on Interpretable ML in Healthcare at International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA. 202

    Simulated Data for Linear Regression with Structured and Sparse Penalties

    Get PDF
    A very active field of research in Bioinformatics is to integrate structure in Machine Learning methods. Methods recently developed claim that they allow simultaneously to link the computed model to the graphical structure of the data set and to select a handful of important features in the analysis. However, there is still no way to simulate data for which we can separate the three properties that such method claim to achieve. These properties are: (i) the sparsity of the solution, i.e., the fact the the model is based on a few features of the data; (ii) the structure of the model; (iii) the relation between the structure of the model and the graphical model behind the generation of the data

    Imaging genetics: bio-informatics and bio-statistics challenges

    Get PDF
    International audienceThe IMAGEN study -- a very large European Research Project -- seeks to identify and characterize biological and environmental factors that in uence teenagers mental health. To this aim, the consortium plans to collect data for more than 2000 subjects at 8 neuroimaging centres. These data comprise neuroimaging data, behavioral tests (for up to 5 hours of testing), and also white blood samples which are collected and processed to obtain 650k single nucleotide polymorphisms (SNP) per subject. Data for more than 1000 subjects have already been collected. We describe the statistical aspects of these data and the challenges, such as the multiple comparison problem, created by such a large imaging genetics study (i.e., 650k for the SNP, 50k data per neuroimage).We also suggest possible strategies, and present some rst investigations using uni or multi-variate methods in association with re-sampling techniques. Specically, because the number of variables is very high, we rst reduce the data size and then use multivariate (CCA, PLS) techniques in association with re-sampling techniques

    A fast computational framework for genome-wide association studies with neuroimaging data

    Get PDF
    International audienceIn the last few years, it has become possible to acquire high-dimensional neuroimaging and genetic data on relatively large cohorts of subjects, which provides novel means to understand the large between-subject variability observed in brain organization. Genetic association studies aim at unveiling correlations between the genetic variants and the numerous phenotypes extracted from brain images and thus face a dire multiple comparisons issue. While these statistics can be accumulated across the brain volume for the sake of sensitivity, the significance of the resulting summary statistics can only be assessed through permutations. Fortunately, the increase of computational power can be exploited, but this requires designing new parallel algorithms. The MapReduce framework coupled with efficient algorithms permits to deliver a scalable analysis tool that deals with high-dimensional data and thousands of permutations in a few hours. On a real functional MRI dataset, this tool shows promising results with a genetic variant that survives the very strict correction for multiple testing

    In Vivo High-Resolution 7 Tesla MRI Shows Early and Diffuse Cortical Alterations in CADASIL

    Get PDF
    Background and Purpose: Recent data suggest that early symptoms may be related to cortex alterations in CADASIL (Cerebral Autosomal-Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy), a monogenic model of cerebral small vessel disease (SVD). The aim of this study was to investigate cortical alterations using both high-resolution T2* acquisitions obtained with 7 Tesla MRI and structural T1 images with 3 Tesla MRI in CADASIL patients with no or only mild symptomatology (modified Rankin's scale = 24). Methods: Complete reconstructions of the cortex using 7 Tesla T2* acquisitions with 0.7 mm isotropic resolution were obtained in 11 patients (52.1 +/- 13.2 years, 36% male) and 24 controls (54.8 +/- 11.0 years, 42% male). Seven Tesla T2* within the cortex and cortical thickness and morphology obtained from 3 Tesla images were compared between CADASIL and control subjects using general linear models. Results: MMSE, brain volume, cortical thickness and global sulcal morphology did not differ between groups. By contrast, T2* measured by 7 Tesla MRI was significantly increased in frontal, parietal, occipital and cingulate cortices in patients after correction for multiple testing. These changes were not related to white matter lesions, lacunes or microhemorrhages in patients having no brain atrophy compared to controls. Conclusions: Seven Tesla MRI, by contrast to state of the art post-processing of 3 Tesla acquisitions, shows diffuse T2* alterations within the cortical mantle in CADASIL whose origin remains to be determined

    Architecture pyramidale agent pour la segmentation d'image : Application à l'extraction d'une zone lobulaire issue d'une mammographie

    Get PDF
    Une nouvelle approche de segmentation d'image, basée d'une part sur une architecture pyramidale et d'autre part sur les concepts agent, est appliquée à des images de mammographie. Les interactions locales d'agents, modélisées par un plan d'actions comportementales assurent une segmentation progressive par fusions pertinentes de régions. L'aspect novateur de ce travail réside , entre autre, dans la transposition du graphe d'adjacence issu de la pyramide irrégulière adaptative en un réseau d'agents accointants, favorisant ainsi au mieux les choix de fusion. L'application des principes de la pyramide agent à l'extraction du tissu glandulaire en mammographie montre des résultats significatifs quant aux potentialités de cette nouvelle méthode de segmentation

    Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate

    Get PDF
    A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown. The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution. The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation. We also present our implementation, the Python package pylearn-simulate, available at https://github.com/neurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper

    Feature selection and classification of imbalanced datasets. Application to PET images of children with Autistic Spectrum Disorders

    Get PDF
    Learning with discriminative methods is generally based on minimizing themisclassification of training samples, which may be unsuitable for imbalanceddatasets where the recognition might be biased in favor of the most numerousclass. This problem can be addressed with a generative approach, which typicallyrequires more parameters to be determined leading to reduced performances inhigh dimension. In such situations, dimension reduction becomes a crucial issue.We propose a feature selection / classification algorithm based on generativemethods in order to predict the clinical status of a highly imbalanced datasetmade of PET scans of forty-five low-functioning children with autism spectrumdisorders (ASD) and thirteen non-ASD low-functioning children. ASDs aretypically characterized by impaired social interaction, narrow interests, andrepetitive behaviours, with a high variability in expression and severity. Thenumerous findings revealed by brain imaging studies suggest that ASD isassociated with a complex and distributed pattern of abnormalities that makesthe identification of a shared and common neuroimaging profile a difficult task.In this context, our goal is to identify the rest functional brain imagingabnormalities pattern associated with ASD and to validate its efficiency inindividual classification. The proposed feature selection algorithm detected acharacteristic pattern in the ASD group that included a hypoperfusion in theright Superior Temporal Sulcus (STS) and a hyperperfusion in the contralateralpostcentral area. Our algorithm allowed for a significantly accurate (88\%),sensitive (91\%) and specific (77\%) prediction of clinical category. For thisimbalanced dataset, with only 13 control scans, the proposed generativealgorithm outperformed other state-of-the-art discriminant methods. The highpredictive power of the characteristic pattern, which has been automaticallyidentified on whole brains without any priors, confirms previous findingsconcerning the role of STS in ASD. This work offers exciting possibilities forearly autism detection and/or the evaluation of treatment response in individualpatients

    Longitudinal brain metabolic changes from amnestic mild cognitive impairment to Alzheimer's disease.

    Get PDF
    International audienceA sensitive marker for monitoring progression of early Alzheimer's disease would help to develop and test new therapeutic strategies. The present study is aimed at investigating brain metabolism changes over time, as a potential monitoring marker, in patients with amnestic mild cognitive impairment, according to their clinical outcome (converters or non-converters), and in relation to their cognitive decline. Seventeen amnestic mild cognitive impairment patients underwent magnetic resonance imaging and 18FDG-positron emission tomography scans both at inclusion and 18 months later. Baseline and follow-up positron emission tomography data were corrected for partial volume effects and spatially normalized using magnetic resonance imaging data, scaled to the vermis and compared using SPM2. 'PET-PAC' maps reflecting metabolic per cent annual changes were created for correlation analyses with cognitive decline. In the whole sample, the greatest metabolic decrease concerned the posterior cingulate-precuneus area. Converters had significantly greater metabolic decrease than non-converters in two ventro-medial prefrontal areas, the subgenual (BA25) and anterior cingulate (BA24/32). PET-PAC in BA25 and BA24/32 combined allowed complete between-group discrimination. BA25 PET-PAC significantly correlated with both cognitive decline and PET-PAC in the hippocampal region and temporal pole, while BA24/32 PET-PAC correlated with posterior cingulate PET-PAC. Finally, the metabolic change in BA8/9/10 was inversely related to that in BA25 and showed relative increase with cognitive decline, suggesting that compensatory processes may occur in this dorso-medial prefrontal region. The observed ventro-medial prefrontal disruption is likely to reflect disconnection from the hippocampus, both indirectly through the cingulum bundle and posterior cingulate cortex for BA24/32, and directly through the uncinate fasciculus for BA25. Altogether, our findings emphasize the potential of 18FDG-positron emission tomography for monitoring early Alzheimer's disease progression
    corecore