Skip to main content
Article thumbnail
Location of Repository

Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems

By Bogdan Gabrys

Abstract

There has been much interest in applying techniques that incorporate knowledge from unlabelled data\ud into a supervised learning system but less effort has been made to compare the effectiveness of different approaches on\ud real world problems and to analyse the behaviour of the learning system when using different amount of unlabelled data.\ud In this paper an analysis of the performance of supervised methods enforced by unlabelled data and some semisupervised\ud approaches using different ratios of labelled to unlabelled samples is presented. The experimental results\ud show that when supported by unlabelled samples much less labelled data is generally required to build a classifier\ud without compromising the classification performance. If only a very limited amount of labelled data is available the\ud results show high variability and the performance of the final classifier is more dependant on how reliable the labelled\ud data samples are rather than use of additional unlabelled data. Semi-supervised clustering utilising both labelled and\ud unlabelled data have been shown to offer most significant improvements when natural clusters are present in the\ud considered problem

Topics: aintel, csi
Year: 2002
OAI identifier: oai:eprints.bournemouth.ac.uk:8608

Suggested articles

Citations

  1. (2002). Analysis of Active Feature Selection in Optic Nerve Data Using Labelled Fuzzy C-Means Clustering”, doi
  2. (2001). Analysis of the correlation between majority voting error and the diversity measures in multiple classifier systems”, doi
  3. (2000). Analyzing the Effectiveness and Applicability of Co-training”, doi
  4. (1981). Cluster Analysis, 2nd ed., doi
  5. (2002). Clustering Unlabelled Data with SOMs Improves Classification of Labelled Realworld doi
  6. (1998). Combining Labelled and Unlabelled Data with Co-Training”, doi
  7. (2000). Enhancing Supervised Learning with Unlabelled Data”,
  8. (1997). Fuzzy Clustering With Partial Supervision”, doi
  9. (2000). General Fuzzy Min-Max Neural Network for Clustering and Classification”, doi
  10. (2002). Learning from Labeled and Unlabeled Data”, doi
  11. (2001). Learning with Labelled and Unlabelled Data”,
  12. (2002). Neuro-Fuzzy Approach to Processing Inputs with Missing Values in Pattern Recognition Problems”, accepted to the doi
  13. (2002). Neuro-Fuzzy Approach to Processing Inputs with Missing Values in Pattern Recognition Problems”, accepted to the International Journal of Approximate Reasoning; doi
  14. (1999). Pattern Recognition, doi
  15. (2000). Semi-Supervised Clustering with User Feedback", doi
  16. (1998). Semi-Supervised Support Vector Machines”, doi
  17. (2000). SemiSupervised Clustering with User Feedback",
  18. (1994). Supervised Learning from incomplete data via an EM Approach”,
  19. (1999). The Role of Unlabelled Data
  20. (2002). Towards a Modelling Framework for Integrating Hybrid Techniques”, doi
  21. (2000). Understanding the Behaviour of Co-training”, doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.