21 research outputs found
Active Learning for Deep Detection Neural Networks
The cost of drawing object bounding boxes (i.e. labeling) for millions of
images is prohibitively high. For instance, labeling pedestrians in a regular
urban image could take 35 seconds on average. Active learning aims to reduce
the cost of labeling by selecting only those images that are informative to
improve the detection network accuracy. In this paper, we propose a method to
perform active learning of object detectors based on convolutional neural
networks. We propose a new image-level scoring process to rank unlabeled images
for their automatic selection, which clearly outperforms classical scores. The
proposed method can be applied to videos and sets of still images. In the
former case, temporal selection rules can complement our scoring process. As a
relevant use case, we extensively study the performance of our method on the
task of pedestrian detection. Overall, the experiments show that the proposed
method performs better than random selection. Our codes are publicly available
at www.gitlab.com/haghdam/deep_active_learning.Comment: Accepted at ICCV 201
Classification Committee for Active Deep Object Detection
In object detection, the cost of labeling is much high because it needs not
only to confirm the categories of multiple objects in an image but also to
accurately determine the bounding boxes of each object. Thus, integrating
active learning into object detection will raise pretty positive significance.
In this paper, we propose a classification committee for active deep object
detection method by introducing a discrepancy mechanism of multiple classifiers
for samples' selection when training object detectors. The model contains a
main detector and a classification committee. The main detector denotes the
target object detector trained from a labeled pool composed of the selected
informative images. The role of the classification committee is to select the
most informative images according to their uncertainty values from the view of
classification, which is expected to focus more on the discrepancy and
representative of instances. Specifically, they compute the uncertainty for a
specified instance within the image by measuring its discrepancy output by the
committee pre-trained via the proposed Maximum Classifiers Discrepancy Group
Loss (MCDGL). The most informative images are finally determined by selecting
the ones with many high-uncertainty instances. Besides, to mitigate the impact
of interference instances, we design a Focus on Positive Instances Loss (FPIL)
to make the committee the ability to automatically focus on the representative
instances as well as precisely encode their discrepancies for the same
instance. Experiments are conducted on Pascal VOC and COCO datasets versus some
popular object detectors. And results show that our method outperforms the
state-of-the-art active learning methods, which verifies the effectiveness of
the proposed method
Introducing artificial data generation in active learning for land use/land cover classification
Fonseca, J., Douzas, G., & Bacao, F. (2021). Increasing the effectiveness of active learning: Introducing artificial data generation in active learning for land use/land cover classification. Remote Sensing, 13(13), 1-20. [2619]. https://doi.org/10.3390/rs13132619In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data “on-demand” for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on the optimization of AL by modifying the selection criteria and the classifiers used. Although improvements in these areas will result in more effective data collection, the use of artificial data sources to reduce human–computer interaction remains unexplored. In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL. The implementation of the proposed AL framework is done using Geometric SMOTE as the data generator. We compare the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets. We show that this modification of the AL framework significantly reduces cost and time requirements for a successful AL implementation in all of the datasets used in the experiment.publishersversionpublishe
An active learning framework for duplicate detection in SaaS platforms
With the rapid growth of users’ data in SaaS (Software-as-a-service)
platforms using micro-services, it becomes essential to detect duplicated entities for ensuring the integrity and consistency of data
in many companies and businesses (primarily multinational corporations). Due to the large volume of databases today, the expected
duplicate detection algorithms need to be not only accurate but also
practical, which means that it can release the detection results as
fast as possible for a given request. Among existing algorithms for
the deduplicate detection problem, using Siamese neural networks
with the triplet loss has become one of the robust ways to measure the similarity of two entities (texts, paragraphs, or documents)
for identifying all possible duplicated items. In this paper, we first
propose a practical framework for building a duplicate detection
system in a SaaS platform. Second, we present a new active learning
schema for training and updating duplicate detection algorithms.
In this schema, we not only allow the crowd to provide more annotated data for enhancing the chosen learning model but also use the
Siamese neural networks as well as the triplet loss to construct an
efficient model for the problem. Finally, we design a user interface
of our proposed deduplicate detection system, which can easily
apply for empirical applications in different companies