13 research outputs found

    Learning Intelligent Dialogs for Bounding Box Annotation

    Get PDF
    We introduce Intelligent Annotation Dialogs for bounding box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. Specifically, we consider two actions: box verification, where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one based on predicting the probability that a box will be positively verified, and the other based on reinforcement learning. We demonstrate that (1) our agents are able to learn efficient annotation strategies in several scenarios, automatically adapting to the image difficulty, the desired quality of the boxes, and the detector strength; (2) in all scenarios the resulting annotation dialogs speed up annotation compared to manual box drawing alone and box verification alone, while also outperforming any fixed combination of verification and drawing in most scenarios; (3) in a realistic scenario where the detector is iteratively re-trained, our agents evolve a series of strategies that reflect the shifting trade-off between verification and drawing as the detector grows stronger.Comment: This paper appeared at CVPR 201

    Unsupervised Learning of Discriminative Relative Visual Attributes

    Full text link

    Multi-level adaptive active learning for scene classification

    Get PDF
    Semantic scene classification is a challenging problem in computer vision. In this paper, we present a novel multi-level active learning approach to reduce the human annotation effort for training robust scene classification models. Different from most existing active learning methods that can only query labels for selected instances at the target categorization level, i.e., the scene class level, our approach establishes a semantic framework that predicts scene labels based on a latent object-based semantic representation of images, and is capable to query labels at two different levels, the target scene class level (abstractive high level) and the latent object class level (semantic middle level). Specifically, we develop an adaptive active learning strategy to perform multi-level label query, which maintains the default label query at the target scene class level, but switches to the latent object class level whenever an "unexpected" target class label is returned by the labeler. We conduct experiments on two standard scene classification datasets to investigate the efficacy of the proposed approach. Our empirical results show the proposed adaptive multi-level active learning approach can outperform both baseline active learning methods and a state-of-the-art multi-level active learning method

    FAST ROTATED BOUNDING BOX ANNOTATIONS FOR OBJECT DETECTION

    Get PDF
    Traditionally, object detection models use a large amount of annotated data and axis-aligned bounding boxes (AABBs) are often chosen as the image annotation technique for both training and predictions. The purpose of annotating the objects in the images is to indicate the regions of interest with the corresponding labels. Accurate object annotations help the computer vision models to understand the distinct patterns of the image features to recognize and localize different classes of objects. However, AABBs are often a poor fit for elongated object instances. It’s also challenging to localize objects with AABBs in densely packed aerial images because of overlapping adjacent bounding boxes. Alternatively, using rectangular annotations that can be oriented diagonally, also known as rotated bounding boxes (RBB), can provide a much tighter fit for elongated objects and reduce the potential bounding box overlap between adjacent objects. However, RBBs are much more time-consuming and tedious to annotate than AABBs for large datasets. In this work, we propose a novel annotation tool named as FastRoLabelImg (Fast Rotated LabelImg) for producing high-quality RBB annotations with low time and effort. The tool generates accurate RBB proposals for objects of interest as the annotator makes progress through the dataset. It can also adapt available AABBs to generate RBB proposals. Furthermore, a multipoint box drawing system is provided to reduce manual RBB annotation time compared to the existing methods. Across three diverse datasets, we show that the proposal generation methods can achieve a maximum of 88.9% manual workload reduction. We also show that our proposed manual annotation method is twice as fast as the existing system with the same accuracy by conducting a participant study. Lastly, we publish the RBB annotations for two public datasets in order to motivate future research that will contribute in developing more competent object detection algorithms capable of RBB predictions

    Learning of classification models from group-based feedback

    Get PDF
    Learning of classification models in practice often relies on a nontrivial amount of human annotation effort. The most widely adopted human labeling process assigns class labels to individual data instances. However, such a process is very rigid and may end up being very time-consuming and costly to conduct in practice. Finding more effective ways to reduce human annotation effort has become critical for building machine learning systems that require human feedback. In this thesis, we propose and investigate a new machine learning approach - Group-Based Active Learning - to learn classification models from limited human feedback. A group is defined by a set of instances represented by conjunctive patterns that are value ranges over the input features. Such conjunctive patterns define hypercubic regions of the input data space. A human annotator assesses the group solely based on its region-based description by providing an estimate of the class proportion for the subpopulation covered by the region. The advantage of this labeling process is that it allows a human to label many instances at the same time, which can, in turn, improve the labeling efficiency. In general, there are infinitely many regions one can define over a real-valued input space. To identify and label groups/regions important for classification learning, we propose and develop a Hierarchical Active Learning framework that actively builds and labels a hierarchy of input regions. Briefly, our framework starts by identifying general regions covering substantial portions of the input data space. After that, it progressively splits the regions into smaller and smaller sub-regions and also acquires class proportion labels for the new regions. The proportion labels for these regions are used to gradually improve and refine a classification model induced by the regions. We develop three versions of the idea. The first two versions aim to build a single hierarchy of regions. One builds it statically using hierarchical clustering, while the other one builds it dynamically, similarly to the decision tree learning process. The third approach builds multiple hierarchies simultaneously, and it offers additional flexibility for identifying more informative and simpler regions. We have conducted comprehensive empirical studies to evaluate our framework. The results show that the methods based on the region-based active learning can learn very good classifiers from a very few and simple region queries, and hence are promising for reducing human annotation effort needed for building a variety of classification models

    Actively Selecting Annotations Among Objects and Attributes

    No full text
    We present an active learning approach to choose image annotation requests among both object category labels and the objects ’ attribute labels. The goal is to solicit those labels that will best use human effort when training a multiclass object recognition model. In contrast to previous work in active visual category learning, our approach directly exploits the dependencies between human-nameable visual attributes and the objects they describe, shifting its requests in either label space accordingly. We adopt a discriminative latent model that captures object-attribute and attribute-attribute relationships, and then define a suitable entropy reduction selection criterion to predict the influence a new label might have throughout those connections. On three challenging datasets, we demonstrate that the method can more successfully accelerate object learning relative to both passive learning and traditional active learning approaches. 1

    Discovering visual attributes from image and video data

    Get PDF
    corecore