34,475 research outputs found

    Cluster-based semi-supervised ensemble learning

    Get PDF
    Semi-supervised classification consists of acquiring knowledge from both labelled and unlabelled data to classify test instances. The cluster assumption represents one of the potential relationships between true classes and data distribution that semi-supervised algorithms assume in order to use unlabelled data. Ensemble algorithms have been widely and successfully employed in both supervised and semi-supervised contexts. In this Thesis, we focus on the cluster assumption to study ensemble learning based on a new cluster regularisation technique for multi-class semi-supervised classification. Firstly, we introduce a multi-class cluster-based classifier, the Cluster-based Regularisation (Cluster- Reg) algorithm. ClusterReg employs a new regularisation mechanism based on posterior probabilities generated by a clustering algorithm in order to avoid generating decision boundaries that traverses high-density regions. Such a method possesses robustness to overlapping classes and to scarce labelled instances on uncertain and low-density regions, when data follows the cluster assumption. Secondly, we propose a robust multi-class boosting technique, Cluster-based Boosting (CBoost), which implements the proposed cluster regularisation for ensemble learning and uses ClusterReg as base learner. CBoost is able to overcome possible incorrect pseudo-labels and produces better generalisation than existing classifiers. And, finally, since there are often datasets with a large number of unlabelled instances, we propose the Efficient Cluster-based Boosting (ECB) for large multi-class datasets. ECB extends CBoost and has lower time and memory complexities than state-of-the-art algorithms. Such a method employs a sampling procedure to reduce the training set of base learners, an efficient clustering algorithm, and an approximation technique for nearest neighbours to avoid the computation of pairwise distance matrix. Hence, ECB enables semi-supervised classification for large-scale datasets

    Semi-Supervised Radio Signal Identification

    Full text link
    Radio emitter recognition in dense multi-user environments is an important tool for optimizing spectrum utilization, identifying and minimizing interference, and enforcing spectrum policy. Radio data is readily available and easy to obtain from an antenna, but labeled and curated data is often scarce making supervised learning strategies difficult and time consuming in practice. We demonstrate that semi-supervised learning techniques can be used to scale learning beyond supervised datasets, allowing for discerning and recalling new radio signals by using sparse signal representations based on both unsupervised and supervised methods for nonlinear feature learning and clustering methods
    • …
    corecore