2,431 research outputs found

    Concise Fuzzy System Modeling Integrating Soft Subspace Clustering and Sparse Learning

    Full text link
    The superior interpretability and uncertainty modeling ability of Takagi-Sugeno-Kang fuzzy system (TSK FS) make it possible to describe complex nonlinear systems intuitively and efficiently. However, classical TSK FS usually adopts the whole feature space of the data for model construction, which can result in lengthy rules for high-dimensional data and lead to degeneration in interpretability. Furthermore, for highly nonlinear modeling task, it is usually necessary to use a large number of rules which further weakens the clarity and interpretability of TSK FS. To address these issues, a concise zero-order TSK FS construction method, called ESSC-SL-CTSK-FS, is proposed in this paper by integrating the techniques of enhanced soft subspace clustering (ESSC) and sparse learning (SL). In this method, ESSC is used to generate the antecedents and various sparse subspace for different fuzzy rules, whereas SL is used to optimize the consequent parameters of the fuzzy rules, based on which the number of fuzzy rules can be effectively reduced. Finally, the proposed ESSC-SL-CTSK-FS method is used to construct con-cise zero-order TSK FS that can explain the scenes in high-dimensional data modeling more clearly and easily. Experiments are conducted on various real-world datasets to confirm the advantages

    Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness

    Full text link
    Cardiac left ventricle (LV) quantification is among the most clinically important tasks for identification and diagnosis of cardiac diseases, yet still a challenge due to the high variability of cardiac structure and the complexity of temporal dynamics. Full quantification, i.e., to simultaneously quantify all LV indices including two areas (cavity and myocardium), six regional wall thicknesses (RWT), three LV dimensions, and one cardiac phase, is even more challenging since the uncertain relatedness intra and inter each type of indices may hinder the learning procedure from better convergence and generalization. In this paper, we propose a newly-designed multitask learning network (FullLVNet), which is constituted by a deep convolution neural network (CNN) for expressive feature embedding of cardiac structure; two followed parallel recurrent neural network (RNN) modules for temporal dynamic modeling; and four linear models for the final estimation. During the final estimation, both intra- and inter-task relatedness are modeled to enforce improvement of generalization: 1) respecting intra-task relatedness, group lasso is applied to each of the regression tasks for sparse and common feature selection and consistent prediction; 2) respecting inter-task relatedness, three phase-guided constraints are proposed to penalize violation of the temporal behavior of the obtained LV indices. Experiments on MR sequences of 145 subjects show that FullLVNet achieves high accurate prediction with our intra- and inter-task relatedness, leading to MAE of 190mm2^2, 1.41mm, 2.68mm for average areas, RWT, dimensions and error rate of 10.4\% for the phase classification. This endows our method a great potential in comprehensive clinical assessment of global, regional and dynamic cardiac function.Comment: Accepted at MICCAI 201

    Bayesian Structured Sparsity from Gaussian Fields

    Full text link
    Substantial research on structured sparsity has contributed to analysis of many different applications. However, there have been few Bayesian procedures among this work. Here, we develop a Bayesian model for structured sparsity that uses a Gaussian process (GP) to share parameters of the sparsity-inducing prior in proportion to feature similarity as defined by an arbitrary positive definite kernel. For linear regression, this sparsity-inducing prior on regression coefficients is a relaxation of the canonical spike-and-slab prior that flattens the mixture model into a scale mixture of normals. This prior retains the explicit posterior probability on inclusion parameters---now with GP probit prior distributions---but enables tractable computation via elliptical slice sampling for the latent Gaussian field. We motivate development of this prior using the genomic application of association mapping, or identifying genetic variants associated with a continuous trait. Our Bayesian structured sparsity model produced sparse results with substantially improved sensitivity and precision relative to comparable methods. Through simulations, we show that three properties are key to this improvement: i) modeling structure in the covariates, ii) significance testing using the posterior probabilities of inclusion, and iii) model averaging. We present results from applying this model to a large genomic dataset to demonstrate computational tractability.Comment: 23 pages, 7 figure

    Global Sensitivity Analysis with Dependence Measures

    Full text link
    Global sensitivity analysis with variance-based measures suffers from several theoretical and practical limitations, since they focus only on the variance of the output and handle multivariate variables in a limited way. In this paper, we introduce a new class of sensitivity indices based on dependence measures which overcomes these insufficiencies. Our approach originates from the idea to compare the output distribution with its conditional counterpart when one of the input variables is fixed. We establish that this comparison yields previously proposed indices when it is performed with Csiszar f-divergences, as well as sensitivity indices which are well-known dependence measures between random variables. This leads us to investigate completely new sensitivity indices based on recent state-of-the-art dependence measures, such as distance correlation and the Hilbert-Schmidt independence criterion. We also emphasize the potential of feature selection techniques relying on such dependence measures as alternatives to screening in high dimension

    Feature Selection: A Data Perspective

    Full text link
    Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity based, information theoretical based, sparse learning based and statistical based methods. To facilitate and promote the research in this community, we also present an open-source feature selection repository that consists of most of the popular feature selection algorithms (\url{http://featureselection.asu.edu/}). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research

    Sparse and Smooth Prior for Bayesian Linear Regression with Application to ETEX Data

    Full text link
    Sparsity of the solution of a linear regression model is a common requirement, and many prior distributions have been designed for this purpose. A combination of the sparsity requirement with smoothness of the solution is also common in application, however, with considerably fewer existing prior models. In this paper, we compare two prior structures, the Bayesian fused lasso (BFL) and least-squares with adaptive prior covariance matrix (LS-APC). Since only variational solution was published for the latter, we derive a Gibbs sampling algorithm for its inference and Bayesian model selection. The method is designed for high dimensional problems, therefore, we discuss numerical issues associated with evaluation of the posterior. In simulation, we show that the LS-APC prior achieves results comparable to that of the Bayesian Fused Lasso for piecewise constant parameter and outperforms the BFL for parameters of more general shapes. Another advantage of the LS-APC priors is revealed in real application to estimation of the release profile of the European Tracer Experiment (ETEX). Specifically, the LS-APC model provides more conservative uncertainty bounds when the regressor matrix is not informative

    Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification

    Full text link
    Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed theoretical definition, we formalize a heuristic method for approximating the interpretability of multivariate brain maps in a binary magnetoencephalography (MEG) decoding scenario. Third, we propose to combine the approximated interpretability and the performance of the brain decoding model into a new multi-objective criterion for model selection. Our results for the MEG data show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms in the future

    HSR: L1/2 Regularized Sparse Representation for Fast Face Recognition using Hierarchical Feature Selection

    Full text link
    In this paper, we propose a novel method for fast face recognition called L1/2 Regularized Sparse Representation using Hierarchical Feature Selection (HSR). By employing hierarchical feature selection, we can compress the scale and dimension of global dictionary, which directly contributes to the decrease of computational cost in sparse representation that our approach is strongly rooted in. It consists of Gabor wavelets and Extreme Learning Machine Auto-Encoder (ELM-AE) hierarchically. For Gabor wavelets part, local features can be extracted at multiple scales and orientations to form Gabor-feature based image, which in turn improves the recognition rate. Besides, in the presence of occluded face image, the scale of Gabor-feature based global dictionary can be compressed accordingly because redundancies exist in Gabor-feature based occlusion dictionary. For ELM-AE part, the dimension of Gabor-feature based global dictionary can be compressed because high-dimensional face images can be rapidly represented by low-dimensional feature. By introducing L1/2 regularization, our approach can produce sparser and more robust representation compared to regularized Sparse Representation based Classification (SRC), which also contributes to the decrease of the computational cost in sparse representation. In comparison with related work such as SRC and Gabor-feature based SRC (GSRC), experimental results on a variety of face databases demonstrate the great advantage of our method for computational cost. Moreover, we also achieve approximate or even better recognition rate.Comment: Submitted to IEEE Computational Intelligence Magazine in 09/201

    The Knowledge Gradient Policy Using A Sparse Additive Belief Model

    Full text link
    We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model

    Lass-0: sparse non-convex regression by local search

    Full text link
    We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate handling of redundant features. Empirically, we use synthetic data to demonstrate that Lass-0 solutions are closer to the true sparse support than L1 regularization models. Additionally, in real-world data Lass-0 finds more parsimonious solutions than L1 regularization while maintaining similar predictive accuracy.Comment: 8 pages, 1 figure. NIPS 2015 Workshop of Optimization (OPT2015
    corecore