1,728 research outputs found
Spectral Mixture Decomposition by Least Dependent Component Analysis
A recently proposed mutual information based algorithm for decomposing data
into least dependent components (MILCA) is applied to spectral analysis, namely
to blind recovery of concentrations and pure spectra from their linear
mixtures. The algorithm is based on precise estimates of mutual information
between measured spectra, which allows to assess and make use of actual
statistical dependencies between them. We show that linear filtering performed
by taking second derivatives effectively reduces the dependencies caused by
overlapping spectral bands and, thereby, assists resolving pure spectra. In
combination with second derivative preprocessing and alternating least squares
postprocessing, MILCA shows decomposition performance comparable with or
superior to specialized chemometrics algorithms. The results are illustrated on
a number of simulated and experimental (infrared and Raman) mixture problems,
including spectroscopy of complex biological materials.
MILCA is available online at http://www.fz-juelich.de/nic/cs/softwareComment: 27 pages, 7 figures, 1 table; uses elsart.cl
Convex Cauchy Schwarz Independent Component Analysis for Blind Source Separation
We present a new high performance Convex Cauchy Schwarz Divergence (CCS DIV)
measure for Independent Component Analysis (ICA) and Blind Source Separation
(BSS). The CCS DIV measure is developed by integrating convex functions into
the Cauchy Schwarz inequality. By including a convexity quality parameter, the
measure has a broad control range of its convexity curvature. With this
measure, a new CCS ICA algorithm is structured and a non parametric form is
developed incorporating the Parzen window based distribution. Furthermore,
pairwise iterative schemes are employed to tackle the high dimensional problem
in BSS. We present two schemes of pairwise non parametric ICA algorithms, one
is based on gradient decent and the second on the Jacobi Iterative method.
Several case study scenarios are carried out on noise free and noisy mixtures
of speech and music signals. Finally, the superiority of the proposed CCS ICA
algorithm is demonstrated in metric comparison performance with FastICA,
RobustICA, convex ICA (C ICA), and other leading existing algorithms.Comment: 13 page
Supervised Dictionary Learning and Sparse Representation-A Review
Dictionary learning and sparse representation (DLSR) is a recent and
successful mathematical model for data representation that achieves
state-of-the-art performance in various fields such as pattern recognition,
machine learning, computer vision, and medical imaging. The original
formulation for DLSR is based on the minimization of the reconstruction error
between the original signal and its sparse representation in the space of the
learned dictionary. Although this formulation is optimal for solving problems
such as denoising, inpainting, and coding, it may not lead to optimal solution
in classification tasks, where the ultimate goal is to make the learned
dictionary and corresponding sparse representation as discriminative as
possible. This motivated the emergence of a new category of techniques, which
is appropriately called supervised dictionary learning and sparse
representation (S-DLSR), leading to more optimal dictionary and sparse
representation in classification tasks. Despite many research efforts for
S-DLSR, the literature lacks a comprehensive view of these techniques, their
connections, advantages and shortcomings. In this paper, we address this gap
and provide a review of the recently proposed algorithms for S-DLSR. We first
present a taxonomy of these algorithms into six categories based on the
approach taken to include label information into the learning of the dictionary
and/or sparse representation. For each category, we draw connections between
the algorithms in this category and present a unified framework for them. We
then provide guidelines for applied researchers on how to represent and learn
the building blocks of an S-DLSR solution based on the problem at hand. This
review provides a broad, yet deep, view of the state-of-the-art methods for
S-DLSR and allows for the advancement of research and development in this
emerging area of research
A survey of dimensionality reduction techniques
Experimental life sciences like biology or chemistry have seen in the recent
decades an explosion of the data available from experiments. Laboratory
instruments become more and more complex and report hundreds or thousands
measurements for a single experiment and therefore the statistical methods face
challenging tasks when dealing with such high dimensional data. However, much
of the data is highly redundant and can be efficiently brought down to a much
smaller number of variables without a significant loss of information. The
mathematical procedures making possible this reduction are called
dimensionality reduction techniques; they have widely been developed by fields
like Statistics or Machine Learning, and are currently a hot research topic. In
this review we categorize the plethora of dimension reduction techniques
available and give the mathematical insight behind them
Independent Component Analysis via Energy-based and Kernel-based Mutual Dependence Measures
We apply both distance-based (Jin and Matteson, 2017) and kernel-based
(Pfister et al., 2016) mutual dependence measures to independent component
analysis (ICA), and generalize dCovICA (Matteson and Tsay, 2017) to MDMICA,
minimizing empirical dependence measures as an objective function in both
deflation and parallel manners. Solving this minimization problem, we introduce
Latin hypercube sampling (LHS) (McKay et al., 2000), and a global optimization
method, Bayesian optimization (BO) (Mockus, 1994) to improve the initialization
of the Newton-type local optimization method. The performance of MDMICA is
evaluated in various simulation studies and an image data example. When the ICA
model is correct, MDMICA achieves competitive results compared to existing
approaches. When the ICA model is misspecified, the estimated independent
components are less mutually dependent than the observed components using
MDMICA, while they are prone to be even more mutually dependent than the
observed components using other approaches.Comment: 11 pages, 4 figure
Two Pairwise Iterative Schemes For High Dimensional Blind Source Separation
This paper addresses the high dimensionality problem in blind source
separation (BSS), where the number of sources is greater than two. Two pairwise
iterative schemes are proposed to tackle this high dimensionality problem. The
two pairwise schemes realize nonparametric independent component analysis (ICA)
algorithms based on a new high-performance Convex CauchySchwarz Divergence
(CCSDIV). These two schemes enable fast and efficient demixing of sources in
real-world high dimensional source applications. Finally, the performance
superiority of the proposed schemes is demonstrated in metric-comparison with
FastICA, RobustICA, convex ICA (CICA), and other leading existing algorithms.Comment: 10 pages, 1 figures, 6 tables. arXiv admin note: substantial text
overlap with arXiv:1408.019
Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms
Brain networks in fMRI are typically identified using spatial independent
component analysis (ICA), yet mathematical constraints such as sparse coding
and positivity both provide alternate biologically-plausible frameworks for
generating brain networks. Non-negative Matrix Factorization (NMF) would
suppress negative BOLD signal by enforcing positivity. Spatial sparse coding
algorithms ( Regularized Learning and K-SVD) would impose local
specialization and a discouragement of multitasking, where the total observed
activity in a single voxel originates from a restricted number of possible
brain networks.
The assumptions of independence, positivity, and sparsity to encode
task-related brain networks are compared; the resulting brain networks for
different constraints are used as basis functions to encode the observed
functional activity at a given time point. These encodings are decoded using
machine learning to compare both the algorithms and their assumptions, using
the time series weights to predict whether a subject is viewing a video,
listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects.
For classifying cognitive activity, the sparse coding algorithm of
Regularized Learning consistently outperformed 4 variations of ICA across
different numbers of networks and noise levels (p0.001). The NMF algorithms,
which suppressed negative BOLD signal, had the poorest accuracy. Within each
algorithm, encodings using sparser spatial networks (containing more
zero-valued voxels) had higher classification accuracy (p0.001). The success
of sparse coding algorithms may suggest that algorithms which enforce sparse
coding, discourage multitasking, and promote local specialization may capture
better the underlying source processes than those which allow inexhaustible
local processes such as ICA
Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data
With the increasing availability of various sensor technologies, we now have
access to large amounts of multi-block (also called multi-set,
multi-relational, or multi-view) data that need to be jointly analyzed to
explore their latent connections. Various component analysis methods have
played an increasingly important role for the analysis of such coupled data. In
this paper, we first provide a brief review of existing matrix-based (two-way)
component analysis methods for the joint analysis of such data with a focus on
biomedical applications. Then, we discuss their important extensions and
generalization to multi-block multiway (tensor) data. We show how constrained
multi-block tensor decomposition methods are able to extract similar or
statistically dependent common features that are shared by all blocks, by
incorporating the multiway nature of data. Special emphasis is given to the
flexible common and individual feature analysis of multi-block data with the
aim to simultaneously extract common and individual latent components with
desired properties and types of diversity. Illustrative examples are given to
demonstrate their effectiveness for biomedical data analysis.Comment: 20 pages, 11 figures, Proceedings of the IEEE, 201
Inhomogeneous Hypergraph Clustering with Applications
Hypergraph partitioning is an important problem in machine learning, computer
vision and network analytics. A widely used method for hypergraph partitioning
relies on minimizing a normalized sum of the costs of partitioning hyperedges
across clusters. Algorithmic solutions based on this approach assume that
different partitions of a hyperedge incur the same cost. However, this
assumption fails to leverage the fact that different subsets of vertices within
the same hyperedge may have different structural importance. We hence propose a
new hypergraph clustering technique, termed inhomogeneous hypergraph
partitioning, which assigns different costs to different hyperedge cuts. We
prove that inhomogeneous partitioning produces a quadratic approximation to the
optimal solution if the inhomogeneous costs satisfy submodularity constraints.
Moreover, we demonstrate that inhomogenous partitioning offers significant
performance improvements in applications such as structure learning of
rankings, subspace segmentation and motif clustering.Comment: To appear in NIPS 201
- …