Search CORE

6 research outputs found

Flexible information management strategies in machine learning and data mining

Author: Nguyen Duc-Cuong
Publication venue
Publication date
Field of study

In recent times, a number of data rnining and machine learning techniques have been applied successfully to discover useful knowledge from data. Of the available techniques, rule induction and data clustering are two of the most useful and popular. Knowledge discovered from rule induction techniques in the form of If-Then rules is easy for users to understand and verify, and can be employed as classification or prediction models. Data clustering techniques are used to explore irregularities in the data distribution. Although rule induction and data clustering techniques are applied successfully in several applications, assumptions and constraints in their approaches have limited their capabilities. The main aim of this work is to develop flexible management strategies for these techniques to improve their performance. The first part of the thesis introduces a new covering algorithm, called Rule Extraction System with Adaptivity, which forms the whole rule set simultaneously instead of a single rule at a time. The rule set in the proposed algorithm is managed flexibly during the learning phase. Rules can be added to or omitted from the rule set depending on knowledge at the time. In addition, facilities to process continuous attributes directly and to prune the rule set automatically are implemented in the Rule Extraction System with Adaptivity algorithm The second part introduces improvements to the K-means algorithm in data clustering. Flexible management of clusters is applied during the learning process to help the algorithm to find the optimal solution. Another flexible management strategy is used to facilitate the processing of very large data sets. Finally, an effective method to determine the most suitable number of clusters for the K-means algorithm is proposed. The method has overcome all deficiencies of K-means

Online Research @ Cardiff

Data clustering using the Bees Algorithm and the Kd-tree structure

Author: Al-Jabbouli Hasan
Publication venue
Publication date: 01/01/2009
Field of study

Data clustering has been studied intensively during the past decade. The K-means and C-means algorithms are the most popular of clustering techniques. The former algorithm is suitable for 'crisp' clustering and the latter, for 'fuzzy' clustering. Clustering using the K-means or C-means algorithms generally is fast and produces good results. Although these algorithms have been successfully implemented in several areas, they still have a number of limitations. The main aim of this work is to develop flexible data management strategies to address some of those limitations and improve the performance of the algorithms. The first part of the thesis introduces improvements to the K-means algorithm. A flexible data structure was applied to help the algorithm to find stable results and to decrease the number of nearest neighbour queries needed to assign data points to clusters. The method has overcome most of the deficiencies of the K-means algorithm. The second and third parts of the thesis present two new clustering algorithms that are capable of locating near optimal solutions efficiently. The proposed algorithms combine the simplicity of the K-means algorithm and the C-means algorithm with the capability of a new optimisation method called the Bees Algorithm to avoid local optima in crisp and fuzzy clustering, respectively. Experimental results for different data sets have demonstrated that the new clustering algorithms produce better performances than those of other algorithms based upon combining an evolutionary optimisation tool and the K-means and C-means clustering methods. The fourth part of this thesis presents an improvement to the basic Bees Algorithm by applying the concept of recursion to reduce the randomness of its local search procedure. The improved Bees Algorithm was applied to crisp and fuzzy data clustering of several data sets. The results obtained confirm the superior performance of the new algorithm.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

OpenGrey Repository