11,512 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Rain Removal in Traffic Surveillance: Does it Matter?
Varying weather conditions, including rainfall and snowfall, are generally
regarded as a challenge for computer vision algorithms. One proposed solution
to the challenges induced by rain and snowfall is to artificially remove the
rain from images or video using rain removal algorithms. It is the promise of
these algorithms that the rain-removed image frames will improve the
performance of subsequent segmentation and tracking algorithms. However, rain
removal algorithms are typically evaluated on their ability to remove synthetic
rain on a small subset of images. Currently, their behavior is unknown on
real-world videos when integrated with a typical computer vision pipeline. In
this paper, we review the existing rain removal algorithms and propose a new
dataset that consists of 22 traffic surveillance sequences under a broad
variety of weather conditions that all include either rain or snowfall. We
propose a new evaluation protocol that evaluates the rain removal algorithms on
their ability to improve the performance of subsequent segmentation, instance
segmentation, and feature tracking algorithms under rain and snow. If
successful, the de-rained frames of a rain removal algorithm should improve
segmentation performance and increase the number of accurately tracked
features. The results show that a recent single-frame-based rain removal
algorithm increases the segmentation performance by 19.7% on our proposed
dataset, but it eventually decreases the feature tracking performance and
showed mixed results with recent instance segmentation methods. However, the
best video-based rain removal algorithm improves the feature tracking accuracy
by 7.72%.Comment: Published in IEEE Transactions on Intelligent Transportation System
Unsupervised Discovery of Parts, Structure, and Dynamics
Humans easily recognize object parts and their hierarchical structure by
watching how they move; they can then predict how each part moves in the
future. In this paper, we propose a novel formulation that simultaneously
learns a hierarchical, disentangled object representation and a dynamics model
for object parts from unlabeled videos. Our Parts, Structure, and Dynamics
(PSD) model learns to, first, recognize the object parts via a layered image
representation; second, predict hierarchy via a structural descriptor that
composes low-level concepts into a hierarchical structure; and third, model the
system dynamics by predicting the future. Experiments on multiple real and
synthetic datasets demonstrate that our PSD model works well on all three
tasks: segmenting object parts, building their hierarchical structure, and
capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor
- …