2,431 research outputs found
Concise Fuzzy System Modeling Integrating Soft Subspace Clustering and Sparse Learning
The superior interpretability and uncertainty modeling ability of
Takagi-Sugeno-Kang fuzzy system (TSK FS) make it possible to describe complex
nonlinear systems intuitively and efficiently. However, classical TSK FS
usually adopts the whole feature space of the data for model construction,
which can result in lengthy rules for high-dimensional data and lead to
degeneration in interpretability. Furthermore, for highly nonlinear modeling
task, it is usually necessary to use a large number of rules which further
weakens the clarity and interpretability of TSK FS. To address these issues, a
concise zero-order TSK FS construction method, called ESSC-SL-CTSK-FS, is
proposed in this paper by integrating the techniques of enhanced soft subspace
clustering (ESSC) and sparse learning (SL). In this method, ESSC is used to
generate the antecedents and various sparse subspace for different fuzzy rules,
whereas SL is used to optimize the consequent parameters of the fuzzy rules,
based on which the number of fuzzy rules can be effectively reduced. Finally,
the proposed ESSC-SL-CTSK-FS method is used to construct con-cise zero-order
TSK FS that can explain the scenes in high-dimensional data modeling more
clearly and easily. Experiments are conducted on various real-world datasets to
confirm the advantages
Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness
Cardiac left ventricle (LV) quantification is among the most clinically
important tasks for identification and diagnosis of cardiac diseases, yet still
a challenge due to the high variability of cardiac structure and the complexity
of temporal dynamics. Full quantification, i.e., to simultaneously quantify all
LV indices including two areas (cavity and myocardium), six regional wall
thicknesses (RWT), three LV dimensions, and one cardiac phase, is even more
challenging since the uncertain relatedness intra and inter each type of
indices may hinder the learning procedure from better convergence and
generalization. In this paper, we propose a newly-designed multitask learning
network (FullLVNet), which is constituted by a deep convolution neural network
(CNN) for expressive feature embedding of cardiac structure; two followed
parallel recurrent neural network (RNN) modules for temporal dynamic modeling;
and four linear models for the final estimation. During the final estimation,
both intra- and inter-task relatedness are modeled to enforce improvement of
generalization: 1) respecting intra-task relatedness, group lasso is applied to
each of the regression tasks for sparse and common feature selection and
consistent prediction; 2) respecting inter-task relatedness, three phase-guided
constraints are proposed to penalize violation of the temporal behavior of the
obtained LV indices. Experiments on MR sequences of 145 subjects show that
FullLVNet achieves high accurate prediction with our intra- and inter-task
relatedness, leading to MAE of 190mm, 1.41mm, 2.68mm for average areas,
RWT, dimensions and error rate of 10.4\% for the phase classification. This
endows our method a great potential in comprehensive clinical assessment of
global, regional and dynamic cardiac function.Comment: Accepted at MICCAI 201
Bayesian Structured Sparsity from Gaussian Fields
Substantial research on structured sparsity has contributed to analysis of
many different applications. However, there have been few Bayesian procedures
among this work. Here, we develop a Bayesian model for structured sparsity that
uses a Gaussian process (GP) to share parameters of the sparsity-inducing prior
in proportion to feature similarity as defined by an arbitrary positive
definite kernel. For linear regression, this sparsity-inducing prior on
regression coefficients is a relaxation of the canonical spike-and-slab prior
that flattens the mixture model into a scale mixture of normals. This prior
retains the explicit posterior probability on inclusion parameters---now with
GP probit prior distributions---but enables tractable computation via
elliptical slice sampling for the latent Gaussian field. We motivate
development of this prior using the genomic application of association mapping,
or identifying genetic variants associated with a continuous trait. Our
Bayesian structured sparsity model produced sparse results with substantially
improved sensitivity and precision relative to comparable methods. Through
simulations, we show that three properties are key to this improvement: i)
modeling structure in the covariates, ii) significance testing using the
posterior probabilities of inclusion, and iii) model averaging. We present
results from applying this model to a large genomic dataset to demonstrate
computational tractability.Comment: 23 pages, 7 figure
Global Sensitivity Analysis with Dependence Measures
Global sensitivity analysis with variance-based measures suffers from several
theoretical and practical limitations, since they focus only on the variance of
the output and handle multivariate variables in a limited way. In this paper,
we introduce a new class of sensitivity indices based on dependence measures
which overcomes these insufficiencies. Our approach originates from the idea to
compare the output distribution with its conditional counterpart when one of
the input variables is fixed. We establish that this comparison yields
previously proposed indices when it is performed with Csiszar f-divergences, as
well as sensitivity indices which are well-known dependence measures between
random variables. This leads us to investigate completely new sensitivity
indices based on recent state-of-the-art dependence measures, such as distance
correlation and the Hilbert-Schmidt independence criterion. We also emphasize
the potential of feature selection techniques relying on such dependence
measures as alternatives to screening in high dimension
Feature Selection: A Data Perspective
Feature selection, as a data preprocessing strategy, has been proven to be
effective and efficient in preparing data (especially high-dimensional data)
for various data mining and machine learning problems. The objectives of
feature selection include: building simpler and more comprehensible models,
improving data mining performance, and preparing clean, understandable data.
The recent proliferation of big data has presented some substantial challenges
and opportunities to feature selection. In this survey, we provide a
comprehensive and structured overview of recent advances in feature selection
research. Motivated by current challenges and opportunities in the era of big
data, we revisit feature selection research from a data perspective and review
representative feature selection algorithms for conventional data, structured
data, heterogeneous data and streaming data. Methodologically, to emphasize the
differences and similarities of most existing feature selection algorithms for
conventional data, we categorize them into four main groups: similarity based,
information theoretical based, sparse learning based and statistical based
methods. To facilitate and promote the research in this community, we also
present an open-source feature selection repository that consists of most of
the popular feature selection algorithms
(\url{http://featureselection.asu.edu/}). Also, we use it as an example to show
how to evaluate feature selection algorithms. At the end of the survey, we
present a discussion about some open problems and challenges that require more
attention in future research
Sparse and Smooth Prior for Bayesian Linear Regression with Application to ETEX Data
Sparsity of the solution of a linear regression model is a common
requirement, and many prior distributions have been designed for this purpose.
A combination of the sparsity requirement with smoothness of the solution is
also common in application, however, with considerably fewer existing prior
models. In this paper, we compare two prior structures, the Bayesian fused
lasso (BFL) and least-squares with adaptive prior covariance matrix (LS-APC).
Since only variational solution was published for the latter, we derive a Gibbs
sampling algorithm for its inference and Bayesian model selection. The method
is designed for high dimensional problems, therefore, we discuss numerical
issues associated with evaluation of the posterior. In simulation, we show that
the LS-APC prior achieves results comparable to that of the Bayesian Fused
Lasso for piecewise constant parameter and outperforms the BFL for parameters
of more general shapes. Another advantage of the LS-APC priors is revealed in
real application to estimation of the release profile of the European Tracer
Experiment (ETEX). Specifically, the LS-APC model provides more conservative
uncertainty bounds when the regressor matrix is not informative
Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification
Brain decoding is a popular multivariate approach for hypothesis testing in
neuroimaging. It is well known that the brain maps derived from weights of
linear classifiers are hard to interpret because of high correlations between
predictors, low signal to noise ratios, and the high dimensionality of
neuroimaging data. Therefore, improving the interpretability of brain decoding
approaches is of primary interest in many neuroimaging studies. Despite
extensive studies of this type, at present, there is no formal definition for
interpretability of multivariate brain maps. As a consequence, there is no
quantitative measure for evaluating the interpretability of different brain
decoding methods. In this paper, first, we present a theoretical definition of
interpretability in brain decoding; we show that the interpretability of
multivariate brain maps can be decomposed into their reproducibility and
representativeness. Second, as an application of the proposed theoretical
definition, we formalize a heuristic method for approximating the
interpretability of multivariate brain maps in a binary magnetoencephalography
(MEG) decoding scenario. Third, we propose to combine the approximated
interpretability and the performance of the brain decoding model into a new
multi-objective criterion for model selection. Our results for the MEG data
show that optimizing the hyper-parameters of the regularized linear classifier
based on the proposed criterion results in more informative multivariate brain
maps. More importantly, the presented definition provides the theoretical
background for quantitative evaluation of interpretability, and hence,
facilitates the development of more effective brain decoding algorithms in the
future
HSR: L1/2 Regularized Sparse Representation for Fast Face Recognition using Hierarchical Feature Selection
In this paper, we propose a novel method for fast face recognition called
L1/2 Regularized Sparse Representation using Hierarchical Feature Selection
(HSR). By employing hierarchical feature selection, we can compress the scale
and dimension of global dictionary, which directly contributes to the decrease
of computational cost in sparse representation that our approach is strongly
rooted in. It consists of Gabor wavelets and Extreme Learning Machine
Auto-Encoder (ELM-AE) hierarchically. For Gabor wavelets part, local features
can be extracted at multiple scales and orientations to form Gabor-feature
based image, which in turn improves the recognition rate. Besides, in the
presence of occluded face image, the scale of Gabor-feature based global
dictionary can be compressed accordingly because redundancies exist in
Gabor-feature based occlusion dictionary. For ELM-AE part, the dimension of
Gabor-feature based global dictionary can be compressed because
high-dimensional face images can be rapidly represented by low-dimensional
feature. By introducing L1/2 regularization, our approach can produce sparser
and more robust representation compared to regularized Sparse Representation
based Classification (SRC), which also contributes to the decrease of the
computational cost in sparse representation. In comparison with related work
such as SRC and Gabor-feature based SRC (GSRC), experimental results on a
variety of face databases demonstrate the great advantage of our method for
computational cost. Moreover, we also achieve approximate or even better
recognition rate.Comment: Submitted to IEEE Computational Intelligence Magazine in 09/201
The Knowledge Gradient Policy Using A Sparse Additive Belief Model
We propose a sequential learning policy for noisy discrete global
optimization and ranking and selection (R\&S) problems with high dimensional
sparse belief functions, where there are hundreds or even thousands of
features, but only a small portion of these features contain explanatory power.
We aim to identify the sparsity pattern and select the best alternative before
the finite budget is exhausted. We derive a knowledge gradient policy for
sparse linear models (KGSpLin) with group Lasso penalty. This policy is a
unique and novel hybrid of Bayesian R\&S with frequentist learning.
Particularly, our method naturally combines B-spline basis expansion and
generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA
model. Theoretically, we provide the estimation error bounds of the posterior
mean estimate and the functional estimate. Controlled experiments show that the
algorithm efficiently learns the correct set of nonzero parameters even when
the model is imbedded with hundreds of dummy parameters. Also it outperforms
the knowledge gradient for a linear model
Lass-0: sparse non-convex regression by local search
We compute approximate solutions to L0 regularized linear regression using L1
regularization, also known as the Lasso, as an initialization step. Our
algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise
search to determine a locally optimal L0 solution given any L1 regularization
solution. We present theoretical results of consistency under orthogonality and
appropriate handling of redundant features. Empirically, we use synthetic data
to demonstrate that Lass-0 solutions are closer to the true sparse support than
L1 regularization models. Additionally, in real-world data Lass-0 finds more
parsimonious solutions than L1 regularization while maintaining similar
predictive accuracy.Comment: 8 pages, 1 figure. NIPS 2015 Workshop of Optimization (OPT2015
- …