11 research outputs found

    Radar-based Hail-producing Storm Detection Using Positive Unlabeled Classification

    Get PDF
    Machine learning methods have been widely used in many fields of weather forecasting. However, some severe weather, such as hailstorm, is difficult to be completely and accurately recorded. These inaccurate data sets will affect the performance of machine-learning-based forecasting models. In this paper, a weather-radar-based hail-producing storm detection method is proposed. This method utilizes the bagging class-weighted support vector machine to learn from partly labeled hail case data and the other unlabeled data, with features extracted from radar and sounding data. The real case data from three radars of North China are used for evaluation. Results suggest that the proposed method could improve both the forecast accuracy and the forecast lead time comparing with the commonly used radar parameter methods. Besides, the proposed method works better than the method with the supervised learning model in any situation, especially when the number of positive samples contaminated in the unlabeled set is large

    A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models

    No full text
    © 2015 Elsevier B.V. We present a novel approach to learn binary classifiers when only positive and unlabeled instances are available (PU learning). This problem is routinely cast as a supervised task with label noise in the negative set. We use an ensemble of SVM models trained on bootstrap resamples of the training data for increased robustness against label noise. The approach can be considered in a bagging framework which provides an intuitive explanation for its mechanics in a semi-supervised setting. We compared our method to state-of-the-art approaches in simulations using multiple public benchmark data sets. The included benchmark comprises three settings with increasing label noise: (i) fully supervised, (ii) PU learning and (iii) PU learning with false positives. Our approach shows a marginal improvement over existing methods in the second setting and a significant improvement in the third.publisher: Elsevier articletitle: A robust ensemble approach to learn from positive and unlabeled data using SVM base models journaltitle: Neurocomputing articlelink: http://dx.doi.org/10.1016/j.neucom.2014.10.081 content_type: article copyright: Copyright © 2015 Elsevier B.V. All rights reserved.status: publishe

    Automated Machine Learning for Positive-Unlabelled Learning

    Get PDF
    Positive-Unlabelled (PU) learning is a field of machine learning that involves learning classifiers from data consisting of positive class and unlabelled instances. That is, instances that may be either positive or negative, but the label is unknown. PU learning differs from standard binary classification due to the absence of negative instances. This difference is non-trivial and requires differing classification frameworks and evaluation metrics. This thesis looks to address gaps in the PU learning literature and make PU learning more accessible to non-experts by introducing Automated Machine Learning (Auto-ML) systems specific to PU learning. Three such systems have been developed, GA-Auto-PU, a Genetic Algorithm (GA)-based Auto-ML system, BO-Auto-PU, a Bayesian Optimisation (BO)-based Auto-ML system, and EBO-Auto-PU, an Evolutionary/Bayesian Optimisation (EBO) hybrid-based Auto-ML system. These three Auto-ML systems are three primary contributions of this work. EBO, the optimiser component of EBO-Auto-PU, is by itself a novel optimisation method developed in this work that has proved effective for the task of Auto-ML and represents another contribution. EBO was developed with the aim of acting as a trade-off between GA, which achieved high predictive performance but at high computational expense, and BO, which, when utilised by the Auto-PU system, did not perform as well as the GA-based system but did execute much faster. EBO achieved this aim, providing high predictive performance with a computational runtime much faster than the GA-based system, and not substantially slower than the BO-based system. The proposed Auto-ML systems for PU learning were evaluated on three versions of 40 datasets, thus evaluated on 120 learning tasks in total. The 40 datasets consist of 20 real-world biomedical datasets and 20 synthetic datasets. The main evaluation measure was the F-measure, a popular measure in PU learning. Based on the F-measure results, the three proposed systems outperformed in general two baseline PU learning methods, usually with statistically significant results. Among the three proposed systems, there was no statistically significance difference between their results in general, whilst a version of the EBO-Auto-PU system performed overall slightly better than the other systems, in terms of F-measure. The two other main contributions of this work relate specifically to the field of PU learning. Firstly, in this work we present and utilise a robust evaluation approach. Evaluating PU learning classifiers is non-trivial and little guidance has been provided in the literature on how to do so. In this work, we present a clear framework for evaluation and use this framework to evaluate the proposed systems. Secondly, when evaluating the proposed systems, an analysis of the most frequently selected components of the optimised PU learning algorithm is presented. That is, the components that constitute the PU learning algorithms produced by the optimisers (for example, the choice of classifiers used in the algorithm, the number of iterations, etc.). This analysis is used to provide guidance on the construction of PU learning algorithms for specific dataset characteristics
    corecore