2 research outputs found
Can Genetic Programming Do Manifold Learning Too?
Exploratory data analysis is a fundamental aspect of knowledge discovery that
aims to find the main characteristics of a dataset. Dimensionality reduction,
such as manifold learning, is often used to reduce the number of features in a
dataset to a manageable level for human interpretation. Despite this, most
manifold learning techniques do not explain anything about the original
features nor the true characteristics of a dataset. In this paper, we propose a
genetic programming approach to manifold learning called GP-MaL which evolves
functional mappings from a high-dimensional space to a lower dimensional space
through the use of interpretable trees. We show that GP-MaL is competitive with
existing manifold learning algorithms, while producing models that can be
interpreted and re-used on unseen data. A number of promising future directions
of research are found in the process.Comment: 16 pages, accepted in EuroGP '1
New representations in genetic programming for feature construction in k-means clustering
© Springer International Publishing AG 2017. k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance