34,475 research outputs found
Cluster-based semi-supervised ensemble learning
Semi-supervised classification consists of acquiring knowledge from both labelled and unlabelled data to classify test instances. The cluster assumption represents one of the potential relationships between true classes and data distribution that semi-supervised algorithms assume in order to use unlabelled data. Ensemble algorithms have been widely and successfully employed in both supervised and semi-supervised contexts. In this Thesis, we focus on the cluster assumption to study ensemble learning based on a new cluster regularisation technique for multi-class semi-supervised classification. Firstly, we introduce a multi-class cluster-based classifier, the Cluster-based Regularisation (Cluster- Reg) algorithm. ClusterReg employs a new regularisation mechanism based on posterior probabilities generated by a clustering algorithm in order to avoid generating decision boundaries that traverses high-density regions. Such a method possesses robustness to overlapping classes and to scarce labelled instances on uncertain and low-density regions, when data follows the cluster assumption. Secondly, we propose a robust multi-class boosting technique, Cluster-based Boosting (CBoost), which implements the proposed cluster regularisation for ensemble learning and uses ClusterReg as base learner. CBoost is able to overcome possible incorrect pseudo-labels and produces better generalisation than existing classifiers. And, finally, since there are often datasets with a large number of unlabelled instances, we propose the Efficient Cluster-based Boosting (ECB) for large multi-class datasets. ECB extends CBoost and has lower time and memory complexities than state-of-the-art algorithms. Such a method employs a sampling procedure to reduce the training set of base learners, an efficient clustering algorithm, and an approximation technique for nearest neighbours to avoid the computation of pairwise distance matrix. Hence, ECB enables semi-supervised classification for large-scale datasets
Semi-Supervised Radio Signal Identification
Radio emitter recognition in dense multi-user environments is an important
tool for optimizing spectrum utilization, identifying and minimizing
interference, and enforcing spectrum policy. Radio data is readily available
and easy to obtain from an antenna, but labeled and curated data is often
scarce making supervised learning strategies difficult and time consuming in
practice. We demonstrate that semi-supervised learning techniques can be used
to scale learning beyond supervised datasets, allowing for discerning and
recalling new radio signals by using sparse signal representations based on
both unsupervised and supervised methods for nonlinear feature learning and
clustering methods
- …