330 research outputs found

    Detecting Outliers in High-Dimensional Neuroimaging Datasets with Robust Covariance Estimators

    Get PDF
    International audienceMedical imaging datasets often contain deviant observations, the so-called outliers, due to acquisition or preprocessing artifacts or resulting from large intrinsic inter-subject variability. These can undermine the statistical procedures used in group studies as the latter assume that the cohorts are composed of homogeneous samples with anatomical or functional features clustered around a central mode. The effects of outlying subjects can be mitigated by detecting and removing them with explicit statistical control. With the emergence of large medical imaging databases, exhaustive data screening is no longer possible, and automated outlier detection methods are currently gaining interest. The datasets used in medical imaging are often high-dimensional and strongly correlated. The outlier detection procedure should therefore rely on high-dimensional statistical multivariate models. However, state-of-the-art procedures are not well-suited for such high-dimensional settings. In this work, we introduce regularization in the MCD framework and investigate different regularization schemes. We carry out extensive simulations to provide backing for practical choices in absence of ground truth knowledge. We demonstrate on functional neuroimaging datasets that outlier detection can be performed with small sample sizes and improves group studies

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    Machine Learning Methods for Depression Detection Using SMRI and RS-FMRI Images

    Get PDF
    Major Depression Disorder (MDD) is a common disease throughout the world that negatively influences people’s lives. Early diagnosis of MDD is beneficial, so detecting practical biomarkers would aid clinicians in the diagnosis of MDD. Having an automated method to find biomarkers for MDD is helpful even though it is difficult. The main aim of this research is to generate a method for detecting discriminative features for MDD diagnosis based on Magnetic Resonance Imaging (MRI) data. In this research, representational similarity analysis provides a framework to compare distributed patterns and obtain the similarity/dissimilarity of brain regions. Regions are obtained by either data-driven or model-driven methods such as cubes and atlases respectively. For structural MRI (sMRI) similarity of voxels of spatial cubes (data-driven) are explored. For resting-state fMRI (rs-fMRI) images, the similarity of the time series of both cubes (data-driven) and atlases (model-driven) are examined. Moreover, the similarity method of the inverse of Minimum Covariant Determinant is applied that excludes outliers from patterns and finds conditionally independent regions given the rest of regions. Next, a statistical test that is robust to outliers, identifies discriminative similarity features between two groups of MDDs and controls. Therefore, the key contribution is the way to get discriminative features that include obtaining similarity of voxel’s cubes/time series using the inverse of robust covariance along with the statistical test. The experimental results show that obtaining these features along with the Bernoulli Naïve Bayes classifier achieves superior performance compared with other methods. The performance of our method is verified by applying it to three imbalanced datasets. Moreover, the similarity-based methods are compared with deep learning and regional-based approaches for detecting MDD using either sMRI or rs-fMRI. Given that depression is famous to be a connectivity disorder problem, investigating the similarity of the brain’s regions is valuable to understand the behavior of the brain. The combinations of structural and functional brain similarities are explored to investigate the brain’s structural and functional properties together. Moreover, the combination of data-driven (cube) and model-driven (atlas) similarities of rs-fMRI are looked over to evaluate how they affect the performance of the classifier. Besides, discriminative similarities are visualized for both sMRI and rs-fMRI. Also, to measure the informativeness of a cube, the relationship of atlas regions with overlapping cubes and vise versa (cubes with overlapping regions) are explored and visualized. Furthermore, the relationship between brain structure and function has been probed through common similarities between structural and resting-state functional networks

    Robust Group-Level Inference in Neuroimaging Genetic Studies

    Get PDF
    International audienceGene-neuroimaging studies involve high-dimensional data that have a complex statistical structure and that are likely to be contaminated with outliers. Robust, outlier-resistant methods are an alternative to prior outliers removal, which is a difficult task under high-dimensional unsupervised settings. In this work, we consider robust regression and its application to neuroimaging through an example gene-neuroimaging study on a large cohort of 300 subjects. We use randomized brain parcellation to sample a set of adapted low-dimensional spatial models to analyse the data. We combine this approach with robust regression in an analysis method that we show is outperforming state-of-the-art neuroimaging analysis methods

    Robust methods for outlier detection and regression for SHM applications.

    Get PDF
    In this paper, robust statistical methods are presented for the data-based approach to structural health monitoring (SHM). The discussion initially focuses on the high level removal of the ‘masking effect’ of inclusive outliers. Multiple outliers commonly occur when novelty detection in the form of unsupervised learning is utilised as a means of damage diagnosis; then benign variations in the operating or environmental conditions of the structure must be handled very carefully, as it is possible that they can lead to false alarms. It is shown that recent developments in the field of robust regression can provide a means of exploring and visualising SHM data as a tool for exploring the different characteristics of outliers, and removing the effects of benign variations. The paper is not, in any sense, a survey; it is an overview and summary of recent work by the authors

    Harmonized-Multinational qEEG Norms (HarMNqEEG)

    Get PDF
    This paper extends the frequency domain quantitative electroencephalography (qEEG) methods pursuing higher sensitivity to detect Brain Developmental Disorders. Prior qEEG work lacked integration of cross-spectral information omitting important functional connectivity descriptors. Lack of geographical diversity precluded accounting for site-specific variance, increasing qEEG nuisance variance. We ameliorate these weaknesses. i) Create lifespan Riemannian multinational qEEG norms for cross-spectral tensors. These norms result from the HarMNqEEG project fostered by the Global Brain Consortium. We calculate the norms with data from 9 countries, 12 devices, and 14 studies, including 1564 subjects. Instead of raw data, only anonymized metadata and EEG cross-spectral tensors were shared. After visual and automatic quality control, developmental equations for the mean and standard deviation of qEEG traditional and Riemannian DPs were calculated using additive mixed-effects models. We demonstrate qEEG "batch effects" and provide methods to calculate harmonized z-scores. ii) We also show that the multinational harmonized Riemannian norms produce z-scores with increased diagnostic accuracy to predict brain dysfunction at school-age produced by malnutrition only in the first year of life. iii) We offer open code and data to calculate different individual z-scores from the HarMNqEEG dataset. These results contribute to developing bias-free, low-cost neuroimaging technologies applicable in various health settings
    corecore