Search CORE

22,962 research outputs found

COCO_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation

Author: B. Gatos
LC Chen
Max Jaderberg
N Otsu
P Andreini
S Bonechi
T-Y Lin
Y Tang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The absence of large scale datasets with pixel-level supervisions is a significant obstacle for the training of deep convolutional networks for scene text segmentation. For this reason, synthetic data generation is normally employed to enlarge the training dataset. Nonetheless, synthetic data cannot reproduce the complexity and variability of natural images. In this paper, a weakly supervised learning approach is used to reduce the shift between training on real and synthetic data. Pixel-level supervisions for a text detection dataset (i.e. where only bounding-box annotations are available) are generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which provides pixel-level supervisions for the COCO-Text dataset, is created and released. The generated annotations are used to train a deep convolutional neural network for semantic segmentation. Experiments show that the proposed dataset can be used instead of synthetic data, allowing us to use only a fraction of the training samples and significantly improving the performances

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università degli Studi di Siena

Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation

Author: Alexander Kolesnikov
HJ Scudder
J Carreira
L Zhang
L Zhang
M Everingham
O Russakovsky
S Liu
S Nowozin
T Toyoda
Publication venue
Publication date: 01/01/2016
Field of study

We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in an image, and to constrain the segmentations to coincide with object boundaries. We show experimentally that training a deep convolutional neural network using the proposed loss function leads to substantially better segmentations than previous state-of-the-art methods on the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the working mechanism of our method by a detailed experimental study that illustrates how the segmentation quality is affected by each term of the proposed loss function as well as their combinations.Comment: ECCV 201

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

Author: A Graves
Chih-Chung Chang
DE Rumelhart
GW Taylor
JC Niebles
P Bojanowski
R Achanta
Publication venue
Publication date: 28/07/2016
Field of study

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities. This protects the model from distractions of visually inconsistent or degenerated alignments without the need of temporal supervision. We further extend our framework to the semi-supervised case when a few frames are sparsely annotated in a video. With less than 1% of labeled frames per video, our method is able to outperform existing semi-supervised approaches and achieve comparable performance to that of fully supervised approaches.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Crossref

"'Who are you?' - Learning person specific classifiers from video"

Author: Everingham M.
Sivic J.
Zisserman A.
Publication venue
Publication date: 01/06/2009
Field of study

We investigate the problem of automatically labelling faces of characters in TV or movie material with their names, using only weak supervision from automaticallyaligned subtitle and script text. Our previous work (Everingham et al. [8]) demonstrated promising results on the task, but the coverage of the method (proportion of video labelled) and generalization was limited by a restriction to frontal faces and nearest neighbour classification. In this paper we build on that method, extending the coverage greatly by the detection and recognition of characters in profile views. In addition, we make the following contributions: (i) seamless tracking, integration and recognition of profile and frontal detections, and (ii) a character specific multiple kernel classifier which is able to learn the features best able to discriminate between the characters. We report results on seven episodes of the TV series “Buffy the Vampire Slayer”, demonstrating significantly increased coverage and performance with respect to previous methods on this material

CiteSeerX

White Rose Research Online