38,292 research outputs found
A supervised clustering approach for fMRI-based inference of brain states
We propose a method that combines signals from many brain regions observed in
functional Magnetic Resonance Imaging (fMRI) to predict the subject's behavior
during a scanning session. Such predictions suffer from the huge number of
brain regions sampled on the voxel grid of standard fMRI data sets: the curse
of dimensionality. Dimensionality reduction is thus needed, but it is often
performed using a univariate feature selection procedure, that handles neither
the spatial structure of the images, nor the multivariate nature of the signal.
By introducing a hierarchical clustering of the brain volume that incorporates
connectivity constraints, we reduce the span of the possible spatial
configurations to a single tree of nested regions tailored to the signal. We
then prune the tree in a supervised setting, hence the name supervised
clustering, in order to extract a parcellation (division of the volume) such
that parcel-based signal averages best predict the target information.
Dimensionality reduction is thus achieved by feature agglomeration, and the
constructed features now provide a multi-scale representation of the signal.
Comparisons with reference methods on both simulated and real data show that
our approach yields higher prediction accuracy than standard voxel-based
approaches. Moreover, the method infers an explicit weighting of the regions
involved in the regression or classification task
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows
acqui- sition of high dimensional datasets from which valuable information may
be extracted using multivariate statistical analyses, such as dimensionality
reduction and automatic classification (supervised and unsupervised). In this
work, a supervised classification through a partial least squares discriminant
analysis (PLS-DA) is performed on the hy- perspectral data. The obtained
results are compared with those obtained by the most commonly used
classification approaches
Dimensionality reduction with image data
A common objective in image analysis is dimensionality reduction. The most common often used data-exploratory technique with this objective is principal component analysis. We propose a new method based on the projection of the images as matrices after a Procrustes rotation and show that it leads to a better reconstruction of images
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Bayesian Inference on Matrix Manifolds for Linear Dimensionality Reduction
We reframe linear dimensionality reduction as a problem of Bayesian inference
on matrix manifolds. This natural paradigm extends the Bayesian framework to
dimensionality reduction tasks in higher dimensions with simpler models at
greater speeds. Here an orthogonal basis is treated as a single point on a
manifold and is associated with a linear subspace on which observations vary
maximally. Throughout this paper, we employ the Grassmann and Stiefel manifolds
for various dimensionality reduction problems, explore the connection between
the two manifolds, and use Hybrid Monte Carlo for posterior sampling on the
Grassmannian for the first time. We delineate in which situations either
manifold should be considered. Further, matrix manifold models are used to
yield scientific insight in the context of cognitive neuroscience, and we
conclude that our methods are suitable for basic inference as well as accurate
prediction.Comment: All datasets and computer programs are publicly available at
http://www.ics.uci.edu/~babaks/Site/Codes.htm
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Similarity-based approaches represent a promising direction for time series
analysis. However, many such methods rely on parameter tuning, and some have
shortcomings if the time series are multivariate (MTS), due to dependencies
between attributes, or the time series contain missing data. In this paper, we
address these challenges within the powerful context of kernel methods by
proposing the robust \emph{time series cluster kernel} (TCK). The approach
taken leverages the missing data handling properties of Gaussian mixture models
(GMM) augmented with informative prior distributions. An ensemble learning
approach is exploited to ensure robustness to parameters by combining the
clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other
state-of-the-art techniques. The experimental results demonstrate that the TCK
is robust to parameter choices, provides competitive results for MTS without
missing data and outstanding results for missing data.Comment: 23 pages, 6 figure
- …