Search CORE

7 research outputs found

Implementation of Feature Selection to Reduce the Number of Features in Determining the Initial Centroid of K-Means Algorithm

Author: Limanto Susana
Miranti Fania Alya
Prasetyo Vincentius Riandaru
Publication venue
Publication date: 01/08/2022
Field of study

Clustering is a data mining method to group data based on its features or attributes. One reasonably popular clustering algorithm is K-Means. K-Means algorithm is often optimized with methods such as the genetic algorithm (GA) to overcome the problem of determining the initial random centroid. Many features in a dataset can reduce the accuracy and increase the computational time of model execution. Feature selection is an algorithm that can reduce data dimension by removing less relevant features for modeling. Therefore, this research will implement Feature selection on the K-Means algorithm optimized with the Dynamic Artificial Chromosome Genetic Algorithm (DAC GA). From the experimental results with ten datasets, it is found that reducing the number of features with feature selection can speed up the computation time of DAC GA to K-Means process by 17,5%. However, all experiments resulted in higher Sum of Square Distance (SSD) and Davies Bouldin Index (DBI) values in clustering results with selected features

University of Surabaya Institutional Repository

Deep Dimension Reduction for Supervised Representation Learning

Author: Huang Jian
Jiao Yuling
Liao Xu
Liu Jin
Yu Zhou
Publication venue
Publication date: 10/06/2020
Field of study

The success of deep supervised learning depends on its automatic data representation abilities. Among all the characteristics of an ideal representation for high-dimensional complex data, information preservation, low dimensionality and disentanglement are the most essential ones. In this work, we propose a deep dimension reduction (DDR) approach to achieving a good data representation with these characteristics for supervised learning. At the population level, we formulate the ideal representation learning task as finding a nonlinear dimension reduction map that minimizes the sum of losses characterizing conditional independence and disentanglement. We estimate the target map at the sample level nonparametrically with deep neural networks. We derive a bound on the excess risk of the deep nonparametric estimator. The proposed method is validated via comprehensive numerical experiments and real data analysis in the context of regression and classification

arXiv.org e-Print Archive

Supervised dimensionality reduction via distance correlation maximization

Author
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2018
Field of study

Crossref