1,974 research outputs found

    Scaling Multidimensional Inference for Big Structured Data

    Get PDF
    In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications [151]. In a world of increasing sensor modalities, cheaper storage, and more data oriented questions, we are quickly passing the limits of tractable computations using traditional statistical analysis methods. Methods which often show great results on simple data have difficulties processing complicated multidimensional data. Accuracy alone can no longer justify unwarranted memory use and computational complexity. Improving the scaling properties of these methods for multidimensional data is the only way to make these methods relevant. In this work we explore methods for improving the scaling properties of parametric and nonparametric models. Namely, we focus on the structure of the data to lower the complexity of a specific family of problems. The two types of structures considered in this work are distributive optimization with separable constraints (Chapters 2-3), and scaling Gaussian processes for multidimensional lattice input (Chapters 4-5). By improving the scaling of these methods, we can expand their use to a wide range of applications which were previously intractable open the door to new research questions

    Incorporating structured assumptions with probabilistic graphical models in fMRI data analysis

    Full text link
    With the wide adoption of functional magnetic resonance imaging (fMRI) by cognitive neuroscience researchers, large volumes of brain imaging data have been accumulated in recent years. Aggregating these data to derive scientific insights often faces the challenge that fMRI data are high-dimensional, heterogeneous across people, and noisy. These challenges demand the development of computational tools that are tailored both for the neuroscience questions and for the properties of the data. We review a few recently developed algorithms in various domains of fMRI research: fMRI in naturalistic tasks, analyzing full-brain functional connectivity, pattern classification, inferring representational similarity and modeling structured residuals. These algorithms all tackle the challenges in fMRI similarly: they start by making clear statements of assumptions about neural data and existing domain knowledge, incorporating those assumptions and domain knowledge into probabilistic graphical models, and using those models to estimate properties of interest or latent structures in the data. Such approaches can avoid erroneous findings, reduce the impact of noise, better utilize known properties of the data, and better aggregate data across groups of subjects. With these successful cases, we advocate wider adoption of explicit model construction in cognitive neuroscience. Although we focus on fMRI, the principle illustrated here is generally applicable to brain data of other modalities.Comment: update with the version accepted by Neuropsychologi

    Modern Views of Machine Learning for Precision Psychiatry

    Full text link
    In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research

    DATA MINING AND IMAGE CLASSIFICATION USING GENETIC PROGRAMMING

    Get PDF
    Genetic programming (GP), a capable machine learning and search method, motivated by Darwinian-evolution, is an evolutionary learning algorithm which automatically evolves computer programs in the form of trees to solve problems. This thesis studies the application of GP for data mining and image processing. Knowledge discovery and data mining have been widely used in business, healthcare, and scientific fields. In data mining, classification is supervised learning that identifies new patterns and maps the data to predefined targets. A GP based classifier is developed in order to perform these mappings. GP has been investigated in a series of studies to classify data; however, there are certain aspects which have not formerly been studied. We propose an optimized GP classifier based on a combination of pruning subtrees and a new fitness function. An orthogonal least squares algorithm is also applied in the training phase to create a robust GP classifier. The proposed GP classifier is validated by 10-fold cross validation. Three areas were studied in this thesis. The first investigation resulted in an optimized genetic-programming-based classifier that directly solves multi-class classification problems. Instead of defining static thresholds as boundaries to differentiate between multiple labels, our work presents a method of classification where a GP system learns the relationships among experiential data and models them mathematically during the evolutionary process. Our approach has been assessed on six multiclass datasets. The second investigation was to develop a GP classifier to segment and detect brain tumors on magnetic resonance imaging (MRI) images. The findings indicated the high accuracy of brain tumor classification provided by our GP classifier. The results confirm the strong ability of the developed technique for complicated image classification problems. The third was to develop a hybrid system for multiclass imbalanced data classification using GP and SMOTE which was tested on satellite images. The finding showed that the proposed approach improves both training and test results when the SMOTE technique is incorporated. We compared our approach in terms of speed with previous GP algorithms as well. The analyzed results illustrate that the developed classifier produces a productive and rapid method for classification tasks that outperforms the previous methods for more challenging multiclass classification problems. We tested the approaches presented in this thesis on publicly available datasets, and images. The findings were statistically tested to conclude the robustness of the developed approaches

    Parameters from site classification to harmonize MRI clinical studies: Application to a multi-site Parkinson's disease dataset

    Full text link
    Multi-site MRI datasets are crucial for big data research. However, neuroimaging studies must face the batch effect. Here, we propose an approach that uses the predictive probabilities provided by Gaussian processes (GPs) to harmonize clinical-based studies. A multi-site dataset of 216 Parkinson's disease (PD) patients and 87 healthy subjects (HS) was used. We performed a site GP classification using MRI data. The outcomes estimated from this classification, redefined like Weighted HARMonization PArameters (WHARMPA), were used as regressors in two different clinical studies: A PD versus HS machine learning classification using GP, and a VBM comparison (FWE-p < .05, k = 100). Same studies were also conducted using conventional Boolean site covariates, and without information about site belonging. The results from site GP classification provided high scores, balanced accuracy (BAC) was 98.39% for grey matter images. PD versus HS classification performed better when the WHARMPA were used to harmonize (BAC = 78.60%; AUC = 0.90) than when using the Boolean site information (BAC = 56.31%; AUC = 0.71) and without it (BAC = 57.22%; AUC = 0.73). The VBM analysis harmonized using WHARMPA provided larger and more statistically robust clusters in regions previously reported in PD than when the Boolean site covariates or no corrections were added to the model. In conclusion, WHARMPA might encode global site-effects quantitatively and allow the harmonization of data. This method is user-friendly and provides a powerful solution, without complex implementations, to clean the analyses by removing variability associated with the differences between sites

    Classification Approaches in Neuroscience: A Geometrical Point of View

    Get PDF
    Functional magnetic resonance images (fMRI) are brain scan images by MRI machine which are taken functionally cross the time. Several studies have investigated methods analyzing such images (or actually the drawn data from them) and is interestingly growing up. For examples models can predict the behaviours and actions of people based on their brain pattern, which can be useful in many fields. We do the classification study and prediction of fMRI data and we develop some approaches and some modifications on them which have not been used in such classification problems. The proposed approaches were assessed by comparing the classification error rates in a real fMRI data study. In addition, many programming codes for reading from fMRI scans and codes for using classification approaches are provided to manipulate fMRI data in practice. The codes, can be gathered later as a package in R. Also, there is a steadily growing interest in analyzing functional data which can often exploit Riemannian geometry. As a prototypical example of these kind of data, we will consider the functional data rising from an electroencephalography (EEG) signal in Brain-Computer interface (BCI) which translates the brain signals to the commands in the machine. It can be used for people with physical inability and movement problems or even in video games, which has had increased interest. To do that, a classification study on EEG signals has been proposed, while, the data in hand to be classified are matrices. A multiplicative algorithm (MPM), which is a fast and efficient algorithm, was developed to compute the power means for matrices which is the crucial step in our proposed approaches for classification. In addition, some simulation studies were used to examine the performance of MPM against existing algorithms. We will compare the behavior of different power means in terms of accuracy in our classifications, which had not been discovered previously. We will show that it is hard to have a guess to find the optimal power mean to have higher accuracy depending on the multivariate distribution of data available. Then, we also develop an approach, combination of power means, to have the benefit of all to improve the classification performance. All the codes related to the fast MPM algorithms and the codes for manipulating EEG signals in classification are written in MATLAB and can be developed later as a package

    Machine Learning Methods for Depression Detection Using SMRI and RS-FMRI Images

    Get PDF
    Major Depression Disorder (MDD) is a common disease throughout the world that negatively influences people’s lives. Early diagnosis of MDD is beneficial, so detecting practical biomarkers would aid clinicians in the diagnosis of MDD. Having an automated method to find biomarkers for MDD is helpful even though it is difficult. The main aim of this research is to generate a method for detecting discriminative features for MDD diagnosis based on Magnetic Resonance Imaging (MRI) data. In this research, representational similarity analysis provides a framework to compare distributed patterns and obtain the similarity/dissimilarity of brain regions. Regions are obtained by either data-driven or model-driven methods such as cubes and atlases respectively. For structural MRI (sMRI) similarity of voxels of spatial cubes (data-driven) are explored. For resting-state fMRI (rs-fMRI) images, the similarity of the time series of both cubes (data-driven) and atlases (model-driven) are examined. Moreover, the similarity method of the inverse of Minimum Covariant Determinant is applied that excludes outliers from patterns and finds conditionally independent regions given the rest of regions. Next, a statistical test that is robust to outliers, identifies discriminative similarity features between two groups of MDDs and controls. Therefore, the key contribution is the way to get discriminative features that include obtaining similarity of voxel’s cubes/time series using the inverse of robust covariance along with the statistical test. The experimental results show that obtaining these features along with the Bernoulli Naïve Bayes classifier achieves superior performance compared with other methods. The performance of our method is verified by applying it to three imbalanced datasets. Moreover, the similarity-based methods are compared with deep learning and regional-based approaches for detecting MDD using either sMRI or rs-fMRI. Given that depression is famous to be a connectivity disorder problem, investigating the similarity of the brain’s regions is valuable to understand the behavior of the brain. The combinations of structural and functional brain similarities are explored to investigate the brain’s structural and functional properties together. Moreover, the combination of data-driven (cube) and model-driven (atlas) similarities of rs-fMRI are looked over to evaluate how they affect the performance of the classifier. Besides, discriminative similarities are visualized for both sMRI and rs-fMRI. Also, to measure the informativeness of a cube, the relationship of atlas regions with overlapping cubes and vise versa (cubes with overlapping regions) are explored and visualized. Furthermore, the relationship between brain structure and function has been probed through common similarities between structural and resting-state functional networks

    Uncertainty Estimation, Explanation and Reduction with Insufficient Data

    Full text link
    Human beings have been juggling making smart decisions under uncertainties, where we manage to trade off between swift actions and collecting sufficient evidence. It is naturally expected that a generalized artificial intelligence (GAI) to navigate through uncertainties meanwhile predicting precisely. In this thesis, we aim to propose strategies that underpin machine learning with uncertainties from three perspectives: uncertainty estimation, explanation and reduction. Estimation quantifies the variability in the model inputs and outputs. It can endow us to evaluate the model predictive confidence. Explanation provides a tool to interpret the mechanism of uncertainties and to pinpoint the potentials for uncertainty reduction, which focuses on stabilizing model training, especially when the data is insufficient. We hope that this thesis can motivate related studies on quantifying predictive uncertainties in deep learning. It also aims to raise awareness for other stakeholders in the fields of smart transportation and automated medical diagnosis where data insufficiency induces high uncertainty. The thesis is dissected into the following sections: Introduction. we justify the necessity to investigate AI uncertainties and clarify the challenges existed in the latest studies, followed by our research objective. Literature review. We break down the the review of the state-of-the-art methods into uncertainty estimation, explanation and reduction. We make comparisons with the related fields encompassing meta learning, anomaly detection, continual learning as well. Uncertainty estimation. We introduce a variational framework, neural process that approximates Gaussian processes to handle uncertainty estimation. Two variants from the neural process families are proposed to enhance neural processes with scalability and continual learning. Uncertainty explanation. We inspect the functional distribution of neural processes to discover the global and local factors that affect the degree of predictive uncertainties. Uncertainty reduction. We validate the proposed uncertainty framework on two scenarios: urban irregular behaviour detection and neurological disorder diagnosis, where the intrinsic data insufficiency undermines the performance of existing deep learning models. Conclusion. We provide promising directions for future works and conclude the thesis
    • …
    corecore