67 research outputs found
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
We introduce an algorithm for word-level text spotting that is able to
accurately and reliably determine the bounding regions of individual words of
text "in the wild". Our system is formed by the cascade of two convolutional
neural networks. The first network is fully convolutional and is in charge of
detecting areas containing text. This results in a very reliable but possibly
inaccurate segmentation of the input image. The second network (inspired by the
popular YOLO architecture) analyzes each segment produced in the first stage,
and predicts oriented rectangular regions containing individual words. No
post-processing (e.g. text line grouping) is necessary. With execution time of
450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the
highest score to date among published algorithms on the ICDAR 2015 Incidental
Scene Text dataset benchmark.Comment: 7 pages, 8 figure
Automatic Semantic Content Removal by Learning to Neglect
We introduce a new system for automatic image content removal and inpainting.
Unlike traditional inpainting algorithms, which require advance knowledge of
the region to be filled in, our system automatically detects the area to be
removed and infilled. Region segmentation and inpainting are performed jointly
in a single pass. In this way, potential segmentation errors are more naturally
alleviated by the inpainting module. The system is implemented as an
encoder-decoder architecture, with two decoder branches, one tasked with
segmentation of the foreground region, the other with inpainting. The encoder
and the two decoder branches are linked via neglect nodes, which guide the
inpainting process in selecting which areas need reconstruction. The whole
model is trained using a conditional GAN strategy. Comparative experiments show
that our algorithm outperforms state-of-the-art inpainting techniques (which,
unlike our system, do not segment the input image and thus must be aided by an
external segmentation module.)Comment: Accepted to BMVC 2018 as an oral presentatio
Multiple Instance Curriculum Learning for Weakly Supervised Object Detection
When supervising an object detector with weakly labeled data, most existing
approaches are prone to trapping in the discriminative object parts, e.g.,
finding the face of a cat instead of the full body, due to lacking the
supervision on the extent of full objects. To address this challenge, we
incorporate object segmentation into the detector training, which guides the
model to correctly localize the full objects. We propose the multiple instance
curriculum learning (MICL) method, which injects curriculum learning (CL) into
the multiple instance learning (MIL) framework. The MICL method starts by
automatically picking the easy training examples, where the extent of the
segmentation masks agree with detection bounding boxes. The training set is
gradually expanded to include harder examples to train strong detectors that
handle complex images. The proposed MICL method with segmentation in the loop
outperforms the state-of-the-art weakly supervised object detectors by a
substantial margin on the PASCAL VOC datasets.Comment: Published in BMVC 201
Large eddy simulation of round jets with mild temperature difference
Understanding the behaviour of hot jets is crucial for various engineering and environmental applications. The present work studies the influence of heat transfer on the dynamics of horizontal round hot jets through Large Eddy Simulations (LES). Our focus lies on trajectory development, large-scale coherent structures, and turbulent kinetic budget analysis in the near-field and intermediate-field regions. LES of two horizontal round hot jets with Reynolds numbers (3934 and 5100) and corresponding Froude numbers (32.98 and 17.07) were carried out using buoyantPimpleFoam solver in OpenFOAM, and the simulation on an isothermal jet was also performed as a baseline for comparison. The results reveal that the jet core temperature decays faster in the streamwise direction but more slowly in the radial direction, indicating a wider temperature spread than velocity, and the maximum difference between the temperature and velocity spread is about 0.5D. Moreover, the energy associated with the large-scale coherent structure decreases with increasing initial jet temperature. The energy of the first two modes of snapshot Proper Orthogonal Decomposition (POD) and extended POD dropped by 12% and 14%, respectively. The coherent motion with the greatest correlation between the temperature and velocity fluctuations is identified as four pairs of Q1 and Q3 events, which are Reynolds shear stress dominant events. Furthermore, compared with the isothermal jet, the turbulent kinetic energy budgets of the hot jets indicate that the diffusion and generation terms are both reduced by approximately 50%, suggesting a transfer of more kinetic energy into potential energy rather than turbulence. The finding highlights the potential of heightened temperatures to mitigate instabilities associated with large-scale motions in hot jets. This study fills the gap on a comprehensive analysis of heat transfer effects on jet dynamics, and quantitative insights into the large-scale coherent structures are provided, contributing to a better understanding of hot jet behaviour
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
We propose Hierarchical Text Spotter (HTS), a novel method for the joint task
of word-level text spotting and geometric layout analysis. HTS can recognize
text in an image and identify its 4-level hierarchical structure: characters,
words, lines, and paragraphs. The proposed HTS is characterized by two novel
components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve
polygons of text lines and an affinity matrix for paragraph grouping between
detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits
lines into characters and further merges them back into words. HTS achieves
state-of-the-art results on multiple word-level text spotting benchmark
datasets as well as geometric layout analysis tasks.Comment: Accepted to WACV 202
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Scene text detection and document layout analysis have long been treated as
two separate tasks in different image domains. In this paper, we bring them
together and introduce the task of unified scene text detection and layout
analysis. The first hierarchical scene text dataset is introduced to enable
this novel research task. We also propose a novel method that is able to
simultaneously detect scene text and form text clusters in a unified way.
Comprehensive experiments show that our unified model achieves better
performance than multiple well-designed baseline methods. Additionally, this
model achieves state-of-the-art results on multiple scene text detection
datasets without the need of complex post-processing. Dataset and code:
https://github.com/google-research-datasets/hiertext.Comment: To appear at CVPR 202
- …