4,587 research outputs found
Scene Text Eraser
The character information in natural scene images contains various personal
information, such as telephone numbers, home addresses, etc. It is a high risk
of leakage the information if they are published. In this paper, we proposed a
scene text erasing method to properly hide the information via an inpainting
convolutional neural network (CNN) model. The input is a scene text image, and
the output is expected to be text erased image with all the character regions
filled up the colors of the surrounding background pixels. This work is
accomplished by a CNN model through convolution to deconvolution with
interconnection process. The training samples and the corresponding inpainting
images are considered as teaching signals for training. To evaluate the text
erasing performance, the output images are detected by a novel scene text
detection method. Subsequently, the same measurement on text detection is
utilized for testing the images in benchmark dataset ICDAR2013. Compared with
direct text detection way, the scene text erasing process demonstrates a
drastically decrease on the precision, recall and f-score. That proves the
effectiveness of proposed method for erasing the text in natural scene images
MTRNet: A Generic Scene Text Eraser
Text removal algorithms have been proposed for uni-lingual scripts with
regular shapes and layouts. However, to the best of our knowledge, a generic
text removal method which is able to remove all or user-specified text regions
regardless of font, script, language or shape is not available. Developing such
a generic text eraser for real scenes is a challenging task, since it inherits
all the challenges of multi-lingual and curved text detection and inpainting.
To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet
is a conditional adversarial generative network (cGAN) with an auxiliary mask.
The introduced auxiliary mask not only makes the cGAN a generic text eraser,
but also enables stable training and early convergence on a challenging
large-scale synthetic dataset, initially proposed for text detection in real
scenes. What's more, MTRNet achieves state-of-the-art results on several
real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without
being explicitly trained on this data, outperforming previous state-of-the-art
methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc
MTRNet++: One-stage Mask-based Scene Text Eraser
A precise, controllable, interpretable and easily trainable text removal
approach is necessary for both user-specific and large-scale text removal
applications. To achieve this, we propose a one-stage mask-based text
inpainting network, MTRNet++. It has a novel architecture that includes
mask-refine, coarse-inpainting and fine-inpainting branches, and attention
blocks. With this architecture, MTRNet++ can remove text either with or without
an external mask. It achieves state-of-the-art results on both the Oxford and
SCUT datasets without using external ground-truth masks. The results of
ablation studies demonstrate that the proposed multi-branch architecture with
attention blocks is effective and essential. It also demonstrates
controllability and interpretability.Comment: This paper is under CVIU review (after major revision
EnsNet: Ensconce Text in the Wild
A new method is proposed for removing text from natural images. The challenge
is to first accurately localize text on the stroke-level and then replace it
with a visually plausible background. Unlike previous methods that require
image patches to erase scene text, our method, namely ensconce network
(EnsNet), can operate end-to-end on a single image without any prior knowledge.
The overall structure is an end-to-end trainable FCN-ResNet-18 network with a
conditional generative adversarial network (cGAN). The feature of the former is
first enhanced by a novel lateral connection structure and then refined by four
carefully designed losses: multiscale regression loss and content loss, which
capture the global discrepancy of different level features; texture loss and
total variation loss, which primarily target filling the text region and
preserving the reality of the background. The latter is a novel local-sensitive
GAN, which attentively assesses the local consistency of the text erased
regions. Both qualitative and quantitative sensitivity experiments on synthetic
images and the ICDAR 2013 dataset demonstrate that each component of the EnsNet
is essential to achieve a good performance. Moreover, our EnsNet can
significantly outperform previous state-of-the-art methods in terms of all
metrics. In addition, a qualitative experiment conducted on the SMBNet dataset
further demonstrates that the proposed method can also preform well on general
object (such as pedestrians) removal tasks. EnsNet is extremely fast, which can
preform at 333 fps on an i5-8600 CPU device.Comment: 8 pages, 8 figures, 2 tables, accepted to appear in AAAI 201
Progressive Scene Text Erasing with Self-Supervision
Scene text erasing seeks to erase text contents from scene images and current
state-of-the-art text erasing models are trained on large-scale synthetic data.
Although data synthetic engines can provide vast amounts of annotated training
samples, there are differences between synthetic and real-world data. In this
paper, we employ self-supervision for feature representation on unlabeled
real-world scene text images. A novel pretext task is designed to keep
consistent among text stroke masks of image variants. We design the Progressive
Erasing Network in order to remove residual texts. The scene text is erased
progressively by leveraging the intermediate generated results which provide
the foundation for subsequent higher quality results. Experiments show that our
method significantly improves the generalization of the text erasing task and
achieves state-of-the-art performance on public benchmarks
Gesture-Based Input for Drawing Schematics on a Mobile Device
We present a system for drawing metro map style schematics using a gesture-based interface. This work brings together techniques in gesture recognition on touch-sensitive devices with research in schematic layout of networks. The software allows users to create and edit schematic networks, and provides an automated layout method for improving the appearance of the schematic. A case study using the metro map metaphor to visualize social networks and web site structure is described
The Remanence of Medieval Media
The Remanence of Medieval Media (uncorrected, pre-publication version)
For: The Routledge Handbook of Digital Medieval Literature, edited by Jen Boyle and Helen Burgess (2017
Selective Scene Text Removal
Scene text removal (STR) is the image transformation task to remove text
regions in scene images. The conventional STR methods remove all scene text.
This means that the existing methods cannot select text to be removed. In this
paper, we propose a novel task setting named selective scene text removal
(SSTR) that removes only target words specified by the user. Although SSTR is a
more complex task than STR, the proposed multi-module structure enables
efficient training for SSTR. Experimental results show that the proposed method
can remove target words as expected.Comment: 12 pages, 8 figures, Accepted at the 34th British Machine Vision
Conferenc
A game engine designed to simplify 2D video game development
In recent years, the increasing popularity of casual games for mobile and web has
promoted the development of new editors to make video games easier to create. The
development of these interactive applications is on its way to becoming democratized, so
that anyone who is interested, without any advanced knowledge of programming, can
create them for devices such as mobile phones or consoles. Nevertheless, most game
development environments rely on the traditional way of programming and need advanced technical skills, even despite today’s improvements. This paper presents a new 2D
game engine that reduces the complexity of video game development processes. The
game specification has been simplified, decreasing the complexity of the engine architecture and introducing a very easy-to-use editing environment for game creation. The
engine presented here allows the behaviour of the game objects to be defined using a very
small set of conditions and actions, without the need to use complex data structures. Some
experiments have been designed in order to validate its ease of use and its capacity in the
creation of a wide variety of games. To test it, users with little experience in programming
have developed arcade games using the presented environment as a proof of its easiness
with respect to other comparable software. Results obtained endorse the concept and the
hypothesis of its easiness of use and demonstrate the engine potential
- …