407 research outputs found
Webly Supervised Learning of Convolutional Networks
We present an approach to utilize large amounts of web data for learning
CNNs. Specifically inspired by curriculum learning, we present a two-step
approach for CNN training. First, we use easy images to train an initial visual
representation. We then use this initial CNN and adapt it to harder, more
realistic images by leveraging the structure of data and categories. We
demonstrate that our two-stage CNN outperforms a fine-tuned CNN trained on
ImageNet on Pascal VOC 2012. We also demonstrate the strength of webly
supervised learning by localizing objects in web images and training a R-CNN
style detector. It achieves the best performance on VOC 2007 where no VOC
training data is used. Finally, we show our approach is quite robust to noise
and performs comparably even when we use image search results from March 2013
(pre-CNN image search era)
Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition
Webly-supervised learning has recently emerged as an alternative paradigm to
traditional supervised learning based on large-scale datasets with manual
annotations. The key idea is that models such as CNNs can be learned from the
noisy visual data available on the web. In this work we aim to exploit web data
for video understanding tasks such as action recognition and detection. One of
the main problems in webly-supervised learning is cleaning the noisy labeled
data from the web. The state-of-the-art paradigm relies on training a first
classifier on noisy data that is then used to clean the remaining dataset. Our
key insight is that this procedure biases the second classifier towards samples
that the first one understands. Here we train two independent CNNs, a RGB
network on web images and video frames and a second network using temporal
information from optical flow. We show that training the networks independently
is vastly superior to selecting the frames for the flow classifier by using our
RGB network. Moreover, we show benefits in enriching the training set with
different data sources from heterogeneous public web databases. We demonstrate
that our framework outperforms all other webly-supervised methods on two public
benchmarks, UCF-101 and Thumos'14.Comment: Submitted to CVIU SI: Computer Vision and the We
DeepSolarEye: Power Loss Prediction and Weakly Supervised Soiling Localization via Fully Convolutional Networks for Solar Panels
The impact of soiling on solar panels is an important and well-studied
problem in renewable energy sector. In this paper, we present the first
convolutional neural network (CNN) based approach for solar panel soiling and
defect analysis. Our approach takes an RGB image of solar panel and
environmental factors as inputs to predict power loss, soiling localization,
and soiling type. In computer vision, localization is a complex task which
typically requires manually labeled training data such as bounding boxes or
segmentation masks. Our proposed approach consists of specialized four stages
which completely avoids localization ground truth and only needs panel images
with power loss labels for training. The region of impact area obtained from
the predicted localization masks are classified into soiling types using the
webly supervised learning. For improving localization capabilities of CNNs, we
introduce a novel bi-directional input-aware fusion (BiDIAF) block that
reinforces the input at different levels of CNN to learn input-specific feature
maps. Our empirical study shows that BiDIAF improves the power loss prediction
accuracy by about 3% and localization accuracy by about 4%. Our end-to-end
model yields further improvement of about 24% on localization when learned in a
weakly supervised manner. Our approach is generalizable and showed promising
results on web crawled solar panel images. Our system has a frame rate of 22
fps (including all steps) on a NVIDIA TitanX GPU. Additionally, we collected
first of it's kind dataset for solar panel image analysis consisting 45,000+
images.Comment: Accepted for publication at WACV 201
- …