There has been much interest in applying techniques that incorporate knowledge from unlabelled data\ud into a supervised learning system but less effort has been made to compare the effectiveness of different approaches on\ud real world problems and to analyse the behaviour of the learning system when using different amount of unlabelled data.\ud In this paper an analysis of the performance of supervised methods enforced by unlabelled data and some semisupervised\ud approaches using different ratios of labelled to unlabelled samples is presented. The experimental results\ud show that when supported by unlabelled samples much less labelled data is generally required to build a classifier\ud without compromising the classification performance. If only a very limited amount of labelled data is available the\ud results show high variability and the performance of the final classifier is more dependant on how reliable the labelled\ud data samples are rather than use of additional unlabelled data. Semi-supervised clustering utilising both labelled and\ud unlabelled data have been shown to offer most significant improvements when natural clusters are present in the\ud considered problem
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.