6,839 research outputs found
WordSup: Exploiting Word Annotations for Character based Text Detection
Imagery texts are usually organized as a hierarchy of several visual
elements, i.e. characters, words, text lines and text blocks. Among these
elements, character is the most basic one for various languages such as
Western, Chinese, Japanese, mathematical expression and etc. It is natural and
convenient to construct a common text detection engine based on character
detectors. However, training character detectors requires a vast of location
annotated characters, which are expensive to obtain. Actually, the existing
real text datasets are mostly annotated in word or line level. To remedy this
dilemma, we propose a weakly supervised framework that can utilize word
annotations, either in tight quadrangles or the more loose bounding boxes, for
character detector training. When applied in scene text detection, we are thus
able to train a robust character detector by exploiting word annotations in the
rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The
character detector acts as a key role in the pipeline of our text detection
engine. It achieves the state-of-the-art performance on several challenging
scene text detection benchmarks. We also demonstrate the flexibility of our
pipeline by various scenarios, including deformed text detection and math
expression recognition.Comment: 2017 International Conference on Computer Visio
Dense Associative Memory is Robust to Adversarial Inputs
Deep neural networks (DNN) trained in a supervised way suffer from two known
problems. First, the minima of the objective function used in learning
correspond to data points (also known as rubbish examples or fooling images)
that lack semantic similarity with the training data. Second, a clean input can
be changed by a small, and often imperceptible for human vision, perturbation,
so that the resulting deformed input is misclassified by the network. These
findings emphasize the differences between the ways DNN and humans classify
patterns, and raise a question of designing learning algorithms that more
accurately mimic human perception compared to the existing methods.
Our paper examines these questions within the framework of Dense Associative
Memory (DAM) models. These models are defined by the energy function, with
higher order (higher than quadratic) interactions between the neurons. We show
that in the limit when the power of the interaction vertex in the energy
function is sufficiently large, these models have the following three
properties. First, the minima of the objective function are free from rubbish
images, so that each minimum is a semantically meaningful pattern. Second,
artificial patterns poised precisely at the decision boundary look ambiguous to
human subjects and share aspects of both classes that are separated by that
decision boundary. Third, adversarial images constructed by models with small
power of the interaction vertex, which are equivalent to DNN with rectified
linear units (ReLU), fail to transfer to and fool the models with higher order
interactions. This opens up a possibility to use higher order models for
detecting and stopping malicious adversarial attacks. The presented results
suggest that DAM with higher order energy functions are closer to human visual
perception than DNN with ReLUs
Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network
We propose an heterogeneous multi-task learning framework for human pose
estimation from monocular image with deep convolutional neural network. In
particular, we simultaneously learn a pose-joint regressor and a sliding-window
body-part detector in a deep network architecture. We show that including the
body-part detection task helps to regularize the network, directing it to
converge to a good solution. We report competitive and state-of-art results on
several data sets. We also empirically show that the learned neurons in the
middle layer of our network are tuned to localized body parts
- …