10,393 research outputs found

    Semi-supervised model-based clustering with controlled clusters leakage

    Full text link
    In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

    How can sustainable public transport be improved? A traffic sign recognition approach using convolutional neural network

    Get PDF
    Sustainable public transport is an important factor to boost urban economic development, and it is also an important part of building a low-carbon environmental society. The application of driverless technology in public transport injects new impetus into its sustainable development. Road traffic sign recognition is the key technology of driverless public transport. It is particularly important to adopt innovative algorithms to optimize the accuracy of traffic sign recognition and build sustainable public transport. Therefore, this paper proposes a convolutional neural network (CNN) based on k-means to optimize the accuracy of traffic sign recognition, and it proposes a sparse maximum CNN to identify difficult traffic signs through hierarchical classification. In the rough classification stage, k-means CNN is used to extract features, and improved support vector machine (SVM) is used for classification. Then, in the fine classification stage, sparse maximum CNN is used for classification. The research results show that the algorithm improves the accuracy of traffic sign recognition more comprehensively and effectively, and it can be effectively applied in unmanned driving technology, which will also bring new breakthroughs for the sustainable development of public transport

    Training from a Better Start Point: Active Self-Semi-Supervised Learning for Few Labeled Samples

    Full text link
    Training with fewer annotations is a key issue for applying deep models to various practical domains. To date, semi-supervised learning has achieved great success in training with few annotations. However, confirmation bias increases dramatically as the number of annotations decreases making it difficult to continue reducing the number of annotations. Based on the observation that the quality of pseudo-labels early in semi-supervised training plays an important role in mitigating confirmation bias, in this paper we propose an active self-semi-supervised learning (AS3L) framework. AS3L bootstraps semi-supervised models with prior pseudo-labels (PPL), where PPL is obtained by label propagation over self-supervised features. We illustrate that the accuracy of PPL is not only affected by the quality of features, but also by the selection of the labeled samples. We develop active learning and label propagation strategies to obtain better PPL. Consequently, our framework can significantly improve the performance of models in the case of few annotations while reducing the training time. Experiments on four semi-supervised learning benchmarks demonstrate the effectiveness of the proposed methods. Our method outperforms the baseline method by an average of 7\% on the four datasets and outperforms the baseline method in accuracy while taking about 1/3 of the training time.Comment: 12 pages, 8 figure

    Supervised and Semi-Supervised Self-Organizing Maps for Regression and Classification Focusing on Hyperspectral Data

    Get PDF
    Machine learning approaches are valuable methods in hyperspectral remote sensing, especially for the classification of land cover or for the regression of physical parameters. While the recording of hyperspectral data has become affordable with innovative technologies, the acquisition of reference data (ground truth) has remained expensive and time-consuming. There is a need for methodological approaches that can handle datasets with significantly more hyperspectral input data than reference data. We introduce the Supervised Self-organizing Maps (SuSi) framework, which can perform unsupervised, supervised and semi-supervised classification as well as regression on high-dimensional data. The methodology of the SuSi framework is presented and compared to other frameworks. Its different parts are evaluated on two hyperspectral datasets. The results of the evaluations can be summarized in four major findings: (1) The supervised and semi-Supervised Self-organizing Maps (SOM) outperform random forest in the regression of soil moisture. (2) In the classification of land cover, the supervised and semi-supervised SOM reveal great potential. (3) The unsupervised SOM is a valuable tool to understand the data. (4) The SuSi framework is versatile, flexible, and easy to use. The SuSi framework is provided as an open-source Python package on GitHub
    corecore