35 research outputs found

    An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

    Full text link
    Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.Comment: European Signal Processing Conference 201

    Similarity Learning via Kernel Preserving Embedding

    Full text link
    Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.Comment: Published in AAAI 201

    Unsupervised Feature Selection with Adaptive Structure Learning

    Full text link
    The problem of feature selection has raised considerable interests in the past decade. Traditional unsupervised methods select the features which can faithfully preserve the intrinsic structures of data, where the intrinsic structures are estimated using all the input features of data. However, the estimated intrinsic structures are unreliable/inaccurate when the redundant and noisy features are not removed. Therefore, we face a dilemma here: one need the true structures of data to identify the informative features, and one need the informative features to accurately estimate the true structures of data. To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. The structures are adaptively learned from the results of feature selection, and the informative features are reselected to preserve the refined structures of data. By leveraging the interactions between these two essential tasks, we are able to capture accurate structures and select more informative features. Experimental results on many benchmark data sets demonstrate that the proposed method outperforms many state of the art unsupervised feature selection methods

    Masking Strategies for Image Manifolds

    Full text link
    We consider the problem of selecting an optimal mask for an image manifold, i.e., choosing a subset of the pixels of the image that preserves the manifold's geometric structure present in the original data. Such masking implements a form of compressive sensing through emerging imaging sensor platforms for which the power expense grows with the number of pixels acquired. Our goal is for the manifold learned from masked images to resemble its full image counterpart as closely as possible. More precisely, we show that one can indeed accurately learn an image manifold without having to consider a large majority of the image pixels. In doing so, we consider two masking methods that preserve the local and global geometric structure of the manifold, respectively. In each case, the process of finding the optimal masking pattern can be cast as a binary integer program, which is computationally expensive but can be approximated by a fast greedy algorithm. Numerical experiments show that the relevant manifold structure is preserved through the data-dependent masking process, even for modest mask sizes

    An Unsupervised Based Stochastic Parallel Gradient Descent For Fcm Learning Algorithm With Feature Selection For Big Data

    Get PDF
    Huge amount of the dataset consists millions of explanation and thousands, hundreds of features, which straightforwardly carry their amount of terabytes level. Selection of these hundreds of features for computer visualization and medical imaging applications problems is solved by using learning algorithm in data mining methods such as clustering, classification and feature selection methods .Among them all of data mining algorithm clustering methods which efficiently group similar features and unsimilar features are grouped as one cluster ,in this paper present a novel unsupervised cluster learning methods for feature selection of big dataset samples. The proposed unsupervised cluster learning methods removing irrelevant and unimportant features through the FCM objective function. The performance of proposed unsupervised FCM learning algorithm is robustly precious via the initial centroid values and fuzzification parameter (m). Therefore, the selection of initial centroid for cluster is very important to improve feature selection results for big dataset samples. To carry out this process, propose a novel Stochastic Parallel Gradient Descent (SPGD) method to select initial centroid of clusters for FCM is automatically to speed up process to group similar features and improve the quality of the cluster. So the proposed clustering method is named as SPFCM clustering, where the fuzzification parameter (m) for cluster is optimized using Hybrid Particle Swarm with Genetic (HPSG) algorithm. The algorithm selects features by calculation of distance value between two feature samples via kernel learning for big dataset samples via unsupervised learning and is especially easy to apply. Experimentation work of the proposed SPFCM and existing clustering methods is experimented in UCI machine learning larger dataset samples, it shows that the proposed SPFCM clustering methods produces higher feature selection results when compare to existing feature selection clustering algorithms , and being computationally extremely well-organized. DOI: 10.17762/ijritcc2321-8169.15072
    corecore