24 research outputs found

    Clustering based feature selection using Partitioning Around Medoids (PAM)

    Get PDF
    High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time. Several studies indicate that not all features of high dimensional data are relevant to classification result. Dimensionality reduction is inevitable and is required due to classifier performance improvement. Several dimensionality reduction techniques were carried out, including feature selection techniques and feature extraction techniques. Sequential forward feature selection and backward feature selection are feature selection using the greedy approach. The heuristics approach is also applied in feature selection, using the Genetic Algorithm, PSO, and Forest Optimization Algorithm. PCA is the most well-known feature extraction method. Besides, other methods such as multidimensional scaling and linear discriminant analysis. In this work, a different approach is applied to perform feature selection. Cluster analysis based feature selection using Partitioning Around Medoids (PAM) clustering is carried out. Our experiment results showed that classification accuracy gained when using feature vectors' medoids to represent the original dataset is high, above 80%

    Unsupervised Feature Selection with Adaptive Structure Learning

    Full text link
    The problem of feature selection has raised considerable interests in the past decade. Traditional unsupervised methods select the features which can faithfully preserve the intrinsic structures of data, where the intrinsic structures are estimated using all the input features of data. However, the estimated intrinsic structures are unreliable/inaccurate when the redundant and noisy features are not removed. Therefore, we face a dilemma here: one need the true structures of data to identify the informative features, and one need the informative features to accurately estimate the true structures of data. To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. The structures are adaptively learned from the results of feature selection, and the informative features are reselected to preserve the refined structures of data. By leveraging the interactions between these two essential tasks, we are able to capture accurate structures and select more informative features. Experimental results on many benchmark data sets demonstrate that the proposed method outperforms many state of the art unsupervised feature selection methods

    Hybrid Multi Attribute Relation Method for Document Clustering for Information Mining

    Get PDF
    Text clustering has been widely utilized with the aim of partitioning speci?c documents’ collection into different subsets using homogeneity/heterogeneity criteria. It has also become a very complicated area of research, including pattern recognition, information retrieval, and text mining. In the applications of enterprises, information mining faces challenges due to the complex distribution of data by an enormous number of different sources. Most of these information sources are from different domains which create difficulties in identifying the relationships among the information. In this case, a single method for clustering limits related information, while enhancing computational overheadsand processing times. Hence, identifying suitable clustering models for unsupervised learning is a challenge, specifically in the case of MultipleAttributesin data distributions. In recent works attribute relation based solutions are given significant importance to suggest the document clustering. To enhance further, in this paper, Hybrid Multi Attribute Relation Methods (HMARs) are presented for attribute selections and relation analyses of co-clustering of datasets. The proposed HMARs allowanalysis of distributed attributes in documents in the form of probabilistic attribute relations using modified Bayesian mechanisms. It also provides solutionsfor identifying most related attribute model for the multiple attribute documents clustering accurately. An experimental evaluation is performed to evaluate the clustering purity and normalization of the information utilizing UCI Data repository which shows 25% better when compared with the previous techniques
    corecore