unknown

Design and analysis of clustering algorithms for numerical, categorical and mixed data

Abstract

In recent times, several machine learning techniques have been applied successfully to discover useful knowledge from data. Cluster analysis that aims at finding similar subgroups from a large heterogeneous collection of records, is one o f the most useful and popular of the available techniques o f data mining. The purpose of this research is to design and analyse clustering algorithms for numerical, categorical and mixed data sets. Most clustering algorithms are limited to either numerical or categorical attributes. Datasets with mixed types o f attributes are common in real life and so to design and analyse clustering algorithms for mixed data sets is quite timely. Determining the optimal solution to the clustering problem is NP-hard. Therefore, it is necessary to find solutions that are regarded as “good enough” quickly. Similarity is a fundamental concept for the definition of a cluster. It is very common to calculate the similarity or dissimilarity between two features using a distance measure. Attributes with large ranges will implicitly assign larger contributions to the metrics than the application to attributes with small ranges. There are only a few papers especially devoted to normalisation methods. Usually data is scaled to unit range. This does not secure equal average contributions of all features to the similarity measure. For that reason, a main part o f this thesis is devoted to normalisation.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Similar works