47,245 research outputs found
Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank
For many applications the collection of labeled data is expensive laborious.
Exploitation of unlabeled data during training is thus a long pursued objective
of machine learning. Self-supervised learning addresses this by positing an
auxiliary task (different, but related to the supervised task) for which data
is abundantly available. In this paper, we show how ranking can be used as a
proxy task for some regression problems. As another contribution, we propose an
efficient backpropagation technique for Siamese networks which prevents the
redundant computation introduced by the multi-branch network architecture. We
apply our framework to two regression problems: Image Quality Assessment (IQA)
and Crowd Counting. For both we show how to automatically generate ranked image
sets from unlabeled data. Our results show that networks trained to regress to
the ground truth targets for labeled data and to simultaneously learn to rank
unlabeled data obtain significantly better, state-of-the-art results for both
IQA and crowd counting. In addition, we show that measuring network uncertainty
on the self-supervised proxy task is a good measure of informativeness of
unlabeled data. This can be used to drive an algorithm for active learning and
we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality
assessment, crowd counting, active learning). arXiv admin note: text overlap
with arXiv:1803.0309
Salient object subitizing
We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA. (0910908 - US NSF; 1029430 - US NSF)https://arxiv.org/abs/1607.07525https://arxiv.org/pdf/1607.07525.pdfAccepted manuscrip
Convolutional Neural Networks for Counting Fish in Fisheries Surveillance Video
We present a computer vision tool that analyses video from a CCTV system installed on fishing trawlers to monitor discarded fish catch. The system aims to support expert observers who review the footage and verify numbers, species and sizes of discarded fish. The operational environment presents a significant challenge for these tasks. Fish are processed below deck under fluorescent lights, they are randomly oriented and there are multiple occlusions. The scene is unstructured and complicated by the presence of fishermen processing the catch. We describe an approach to segmenting the scene and counting fish that exploits the -Fields algorithm. We performed extensive tests of the algorithm on a data set comprising 443 frames from 6 belts. Results indicate the relative count error (for individual fish) ranges from 2\% to 16\%. We believe this is the first system that is able to handle footage from operational trawlers
TasselNet: Counting maize tassels in the wild via local counts regression network
Accurately counting maize tassels is important for monitoring the growth
status of maize plants. This tedious task, however, is still mainly done by
manual efforts. In the context of modern plant phenotyping, automating this
task is required to meet the need of large-scale analysis of genotype and
phenotype. In recent years, computer vision technologies have experienced a
significant breakthrough due to the emergence of large-scale datasets and
increased computational resources. Naturally image-based approaches have also
received much attention in plant-related studies. Yet a fact is that most
image-based systems for plant phenotyping are deployed under controlled
laboratory environment. When transferring the application scenario to
unconstrained in-field conditions, intrinsic and extrinsic variations in the
wild pose great challenges for accurate counting of maize tassels, which goes
beyond the ability of conventional image processing techniques. This calls for
further robust computer vision approaches to address in-field variations. This
paper studies the in-field counting problem of maize tassels. To our knowledge,
this is the first time that a plant-related counting problem is considered
using computer vision technologies under unconstrained field-based environment.Comment: 14 page
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting
In this paper we propose a technique to adapt a convolutional neural network
(CNN) based object counter to additional visual domains and object types while
still preserving the original counting function. Domain-specific normalisation
and scaling operators are trained to allow the model to adjust to the
statistical distributions of the various visual domains. The developed
adaptation technique is used to produce a singular patch-based counting
regressor capable of counting various object types including people, vehicles,
cell nuclei and wildlife. As part of this study a challenging new cell counting
dataset in the context of tissue culture and patient diagnosis is constructed.
This new collection, referred to as the Dublin Cell Counting (DCC) dataset, is
the first of its kind to be made available to the wider computer vision
community. State-of-the-art object counting performance is achieved in both the
Shanghaitech (parts A and B) and Penguins datasets while competitive
performance is observed on the TRANCOS and Modified Bone Marrow (MBM) datasets,
all using a shared counting model.Comment: 10 page
- …