3 research outputs found
Improved Training for Self-Training by Confidence Assessments
It is well known that for some tasks, labeled data sets may be hard to
gather. Therefore, we wished to tackle here the problem of having insufficient
training data. We examined learning methods from unlabeled data after an
initial training on a limited labeled data set. The suggested approach can be
used as an online learning method on the unlabeled test set. In the general
classification task, whenever we predict a label with high enough confidence,
we treat it as a true label and train the data accordingly. For the semantic
segmentation task, a classic example for an expensive data labeling process, we
do so pixel-wise. Our suggested approaches were applied on the MNIST data-set
as a proof of concept for a vision classification task and on the ADE20K
data-set in order to tackle the semi-supervised semantic segmentation problem
Self-Updating Models with Error Remediation
Many environments currently employ machine learning models for data
processing and analytics that were built using a limited number of training
data points. Once deployed, the models are exposed to significant amounts of
previously-unseen data, not all of which is representative of the original,
limited training data. However, updating these deployed models can be difficult
due to logistical, bandwidth, time, hardware, and/or data sensitivity
constraints. We propose a framework, Self-Updating Models with Error
Remediation (SUMER), in which a deployed model updates itself as new data
becomes available. SUMER uses techniques from semi-supervised learning and
noise remediation to iteratively retrain a deployed model using
intelligently-chosen predictions from the model as the labels for new training
iterations. A key component of SUMER is the notion of error remediation as
self-labeled data can be susceptible to the propagation of errors. We
investigate the use of SUMER across various data sets and iterations. We find
that self-updating models (SUMs) generally perform better than models that do
not attempt to self-update when presented with additional previously-unseen
data. This performance gap is accentuated in cases where there is only limited
amounts of initial training data. We also find that the performance of SUMER is
generally better than the performance of SUMs, demonstrating a benefit in
applying error remediation. Consequently, SUMER can autonomously enhance the
operational capabilities of existing data processing systems by intelligently
updating models in dynamic environments.Comment: 17 pages, 13 figures, published in the proceedings of the Artificial
Intelligence and Machine Learning for Multi-Domain Operations Applications II
conference in the SPIE Defense + Commercial Sensing, 2020 symposiu