10,393 research outputs found
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
How can sustainable public transport be improved? A traffic sign recognition approach using convolutional neural network
Sustainable public transport is an important factor to boost urban economic development, and it is also an important part of building a low-carbon environmental society. The application of driverless technology in public transport injects new impetus into its sustainable development. Road traffic sign recognition is the key technology of driverless public transport. It is particularly important to adopt innovative algorithms to optimize the accuracy of traffic sign recognition and build sustainable public transport. Therefore, this paper proposes a convolutional neural network (CNN) based on k-means to optimize the accuracy of traffic sign recognition, and it proposes a sparse maximum CNN to identify difficult traffic signs through hierarchical classification. In the rough classification stage, k-means CNN is used to extract features, and improved support vector machine (SVM) is used for classification. Then, in the fine classification stage, sparse maximum CNN is used for classification. The research results show that the algorithm improves the accuracy of traffic sign recognition more comprehensively and effectively, and it can be effectively applied in unmanned driving technology, which will also bring new breakthroughs for the sustainable development of public transport
Training from a Better Start Point: Active Self-Semi-Supervised Learning for Few Labeled Samples
Training with fewer annotations is a key issue for applying deep models to
various practical domains. To date, semi-supervised learning has achieved great
success in training with few annotations. However, confirmation bias increases
dramatically as the number of annotations decreases making it difficult to
continue reducing the number of annotations. Based on the observation that the
quality of pseudo-labels early in semi-supervised training plays an important
role in mitigating confirmation bias, in this paper we propose an active
self-semi-supervised learning (AS3L) framework. AS3L bootstraps semi-supervised
models with prior pseudo-labels (PPL), where PPL is obtained by label
propagation over self-supervised features. We illustrate that the accuracy of
PPL is not only affected by the quality of features, but also by the selection
of the labeled samples. We develop active learning and label propagation
strategies to obtain better PPL. Consequently, our framework can significantly
improve the performance of models in the case of few annotations while reducing
the training time. Experiments on four semi-supervised learning benchmarks
demonstrate the effectiveness of the proposed methods. Our method outperforms
the baseline method by an average of 7\% on the four datasets and outperforms
the baseline method in accuracy while taking about 1/3 of the training time.Comment: 12 pages, 8 figure
Supervised and Semi-Supervised Self-Organizing Maps for Regression and Classification Focusing on Hyperspectral Data
Machine learning approaches are valuable methods in hyperspectral remote sensing, especially for the classification of land cover or for the regression of physical parameters. While the recording of hyperspectral data has become affordable with innovative technologies, the acquisition of reference data (ground truth) has remained expensive and time-consuming. There is a need for methodological approaches that can handle datasets with significantly more hyperspectral input data than reference data. We introduce the Supervised Self-organizing Maps (SuSi) framework, which can perform unsupervised, supervised and semi-supervised classification as well as regression on high-dimensional data. The methodology of the SuSi framework is presented and compared to other frameworks. Its different parts are evaluated on two hyperspectral datasets. The results of the evaluations can be summarized in four major findings: (1) The supervised and semi-Supervised Self-organizing Maps (SOM) outperform random forest in the regression of soil moisture. (2) In the classification of land cover, the supervised and semi-supervised SOM reveal great potential. (3) The unsupervised SOM is a valuable tool to understand the data. (4) The SuSi framework is versatile, flexible, and easy to use. The SuSi framework is provided as an open-source Python package on GitHub
- …