59,669 research outputs found

    Unsupervised Feature Selection Based on Self-configuration Approaches using Multidimensional Scaling

    Get PDF
    Some researchers often collect features so the principal information does not lose. However, many features sometimes cause problems. The truth of analysis results will decrease because of the irrelevant or repetitive features. To overcome it, one of the solutions is feature selection. They are divided into two, namely supervised and unsupervised learning. In supervised, the feature selection can only be carried out on data containing labels. Meanwhile, in unsupervised, there are three approaches correlation, configuration, and variance. This study proposes an unsupervised feature selection by combining correlation and configuration using multidimensional scaling (MDS). The proposed algorithm is MDS-Clustering, which uses hierarchical and non-hierarchical clustering. The result of MDS-clustering is compared with the existing feature selection. There are three schemes in the comparison process, namely, 75\%, 50\%, and 25\% feature selected. The dataset used in this study is the UCI dataset. The validities used are the goodness-of-fit of the proximity matrix (GoFP) and the accuracy of the classification algorithm. The comparison results show that the feature selection proposed is certainly worth recommending as a new approach in the feature selection process. Besides, on certain data, the algorithm can outperform the existing feature selection

    Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision

    Full text link
    Feature selection is essential for effective visual recognition. We propose an efficient joint classifier learning and feature selection method that discovers sparse, compact representations of input features from a vast sea of candidates, with an almost unsupervised formulation. Our method requires only the following knowledge, which we call the \emph{feature sign}---whether or not a particular feature has on average stronger values over positive samples than over negatives. We show how this can be estimated using as few as a single labeled training sample per class. Then, using these feature signs, we extend an initial supervised learning problem into an (almost) unsupervised clustering formulation that can incorporate new data without requiring ground truth labels. Our method works both as a feature selection mechanism and as a fully competitive classifier. It has important properties, low computational cost and excellent accuracy, especially in difficult cases of very limited training data. We experiment on large-scale recognition in video and show superior speed and performance to established feature selection approaches such as AdaBoost, Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.Comment: arXiv admin note: text overlap with arXiv:1411.771

    An Unsupervised Based Stochastic Parallel Gradient Descent For Fcm Learning Algorithm With Feature Selection For Big Data

    Get PDF
    Huge amount of the dataset consists millions of explanation and thousands, hundreds of features, which straightforwardly carry their amount of terabytes level. Selection of these hundreds of features for computer visualization and medical imaging applications problems is solved by using learning algorithm in data mining methods such as clustering, classification and feature selection methods .Among them all of data mining algorithm clustering methods which efficiently group similar features and unsimilar features are grouped as one cluster ,in this paper present a novel unsupervised cluster learning methods for feature selection of big dataset samples. The proposed unsupervised cluster learning methods removing irrelevant and unimportant features through the FCM objective function. The performance of proposed unsupervised FCM learning algorithm is robustly precious via the initial centroid values and fuzzification parameter (m). Therefore, the selection of initial centroid for cluster is very important to improve feature selection results for big dataset samples. To carry out this process, propose a novel Stochastic Parallel Gradient Descent (SPGD) method to select initial centroid of clusters for FCM is automatically to speed up process to group similar features and improve the quality of the cluster. So the proposed clustering method is named as SPFCM clustering, where the fuzzification parameter (m) for cluster is optimized using Hybrid Particle Swarm with Genetic (HPSG) algorithm. The algorithm selects features by calculation of distance value between two feature samples via kernel learning for big dataset samples via unsupervised learning and is especially easy to apply. Experimentation work of the proposed SPFCM and existing clustering methods is experimented in UCI machine learning larger dataset samples, it shows that the proposed SPFCM clustering methods produces higher feature selection results when compare to existing feature selection clustering algorithms , and being computationally extremely well-organized. DOI: 10.17762/ijritcc2321-8169.15072

    Study on Unsupervised Feature Selection Method Based on Extended Entropy

    Get PDF
    Feature selection techniques are designed to find the relevant feature subset of the original features that can facilitate clustering, classification and retrieval. It is an important research topic in pattern recognition and machine learning. Feature selection is mainly partitioned into two classes, i.e. supervised and unsupervised methods. Currently research mostly concentrates on supervised ones. Few efficient unsupervised feature selection methods have been developed because no label information is available. On the other hand, it is difficult to evaluate the selected features. An unsupervised feature selection method based on extended entropy is proposed here. The information loss based on extended entropy is used to measure the correlation between features. The method assures that the selected features have both big individual information and little redundancy information with the selected features. At last, the efficiency of the proposed method is illustrated with some practical datasets

    Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

    Get PDF
    This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

    Localized Feature Selection For Unsupervised Learning

    Get PDF
    Clustering is the unsupervised classification of data objects into different groups (clusters) such that objects in one group are similar together and dissimilar from another group. Feature selection for unsupervised learning is a technique that chooses the best feature subset for clustering. In general, unsupervised feature selection algorithms conduct feature selection in a global sense by producing a common feature subset for all the clusters. This, however, can be invalid in clustering practice, where the local intrinsic property of data matters more, which implies that localized feature selection is more desirable. In this dissertation, we focus on cluster-wise feature selection for unsupervised learning. We first propose a Cross-Projection method to achieve localized feature selection. The proposed algorithm computes adjusted and normalized scatter separability for individual clusters. A sequential backward search is then applied to find the optimal (perhaps local) feature subsets for each cluster. Our experimental results show the need for feature selection in clustering and the benefits of selecting features locally. We also present another approach based on Maximal Likelihood with Gaussian mixture. We introduce a probabilistic model based on Gaussian mixture. The feature relevance for an individual cluster is treated as a probability, which is represented by localized feature saliency and estimated through Expectation Maximization (EM) algorithm during the clustering process. In addition, the number of clusters is determined by integrating a Minimum Message Length (MML) criterion. Experiments carried out on both synthetic and real-world datasets illustrate the performance of the approach in finding embedded clusters. Another novel approach based on Bayesian framework is successfully implemented. We place prior distributions over the parameters of the Gaussian mixture model, and maximize the marginal log-likelihood given mixing co-efficient and feature saliency. The parameters are estimated by Bayesian Variational Learning. This approach computes the feature saliency for each cluster, and detects the number of clusters simultaneously

    Removing redundant features via clustering : preliminary results in mental task separation

    Get PDF
    Recent clustering algorithms have been designed to take into account the degree of relevance of each feature, by automatically calculating their weights. However, as the tendency is to evaluate each feature at a time, these algorithms may have difficulties dealing with features containing similar information. Should this information be relevant, these algorithms would set high weights to all such features instead of removing some due to their redundant nature. In this paper we introduce an unsupervised feature selection method that targets redundant features. Our method clusters similar features together and selects a subset of representative features for each cluster. This selection is based on the maximum information compression index between each feature and its respective cluster centroid. We empirically validate out method by comparing with it with a popular unsupervised feature selection on three EEG data sets. We find that ours selects features that produce better cluster recovery, without the need for an extra user-defined parameterFinal Accepted Versio
    corecore