536 research outputs found
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
We introduce an algorithm for word-level text spotting that is able to
accurately and reliably determine the bounding regions of individual words of
text "in the wild". Our system is formed by the cascade of two convolutional
neural networks. The first network is fully convolutional and is in charge of
detecting areas containing text. This results in a very reliable but possibly
inaccurate segmentation of the input image. The second network (inspired by the
popular YOLO architecture) analyzes each segment produced in the first stage,
and predicts oriented rectangular regions containing individual words. No
post-processing (e.g. text line grouping) is necessary. With execution time of
450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the
highest score to date among published algorithms on the ICDAR 2015 Incidental
Scene Text dataset benchmark.Comment: 7 pages, 8 figure
Deep Learning of Atomically Resolved Scanning Transmission Electron Microscopy Images: Chemical Identification and Tracking Local Transformations
Recent advances in scanning transmission electron and scanning probe
microscopies have opened exciting opportunities in probing the materials
structural parameters and various functional properties in real space with
angstrom-level precision. This progress has been accompanied by an exponential
increase in the size and quality of datasets produced by microscopic and
spectroscopic experimental techniques. These developments necessitate adequate
methods for extracting relevant physical and chemical information from the
large datasets, for which a priori information on the structures of various
atomic configurations and lattice defects is limited or absent. Here we
demonstrate an application of deep neural networks to extract information from
atomically resolved images including location of the atomic species and type of
defects. We develop a 'weakly-supervised' approach that uses information on the
coordinates of all atomic species in the image, extracted via a deep neural
network, to identify a rich variety of defects that are not part of an initial
training set. We further apply our approach to interpret complex atomic and
defect transformation, including switching between different coordination of
silicon dopants in graphene as a function of time, formation of peculiar
silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of
molecular 'rotor'. This deep learning based approach resembles logic of a human
operator, but can be scaled leading to significant shift in the way of
extracting and analyzing information from raw experimental data
LiDAR-assisted Large-scale Privacy Protection in Street-view Cycloramas
Recently, privacy has a growing importance in several domains, especially in
street-view images. The conventional way to achieve this is to automatically
detect and blur sensitive information from these images. However, the
processing cost of blurring increases with the ever-growing resolution of
images. We propose a system that is cost-effective even after increasing the
resolution by a factor of 2.5. The new system utilizes depth data obtained from
LiDAR to significantly reduce the search space for detection, thereby reducing
the processing cost. Besides this, we test several detectors after reducing the
detection space and provide an alternative solution based on state-of-the-art
deep learning detectors to the existing HoG-SVM-Deep system that is faster and
has a higher performance.Comment: Accepted at Electronic Imaging 201
ADD: An Automatic Desensitization Fisheye Dataset for Autonomous Driving
Autonomous driving systems require many images for analyzing the surrounding
environment. However, there is fewer data protection for private information
among these captured images, such as pedestrian faces or vehicle license
plates, which has become a significant issue. In this paper, in response to the
call for data security laws and regulations and based on the advantages of
large Field of View(FoV) of the fisheye camera, we build the first Autopilot
Desensitization Dataset, called ADD, and formulate the first
deep-learning-based image desensitization framework, to promote the study of
image desensitization in autonomous driving scenarios. The compiled dataset
consists of 650K images, including different face and vehicle license plate
information captured by the surround-view fisheye camera. It covers various
autonomous driving scenarios, including diverse facial characteristics and
license plate colors. Then, we propose an efficient multitask desensitization
network called DesCenterNet as a benchmark on the ADD dataset, which can
perform face and vehicle license plate detection and desensitization tasks.
Based on ADD, we further provide an evaluation criterion for desensitization
performance, and extensive comparison experiments have verified the
effectiveness and superiority of our method on image desensitization
MTRNet: A Generic Scene Text Eraser
Text removal algorithms have been proposed for uni-lingual scripts with
regular shapes and layouts. However, to the best of our knowledge, a generic
text removal method which is able to remove all or user-specified text regions
regardless of font, script, language or shape is not available. Developing such
a generic text eraser for real scenes is a challenging task, since it inherits
all the challenges of multi-lingual and curved text detection and inpainting.
To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet
is a conditional adversarial generative network (cGAN) with an auxiliary mask.
The introduced auxiliary mask not only makes the cGAN a generic text eraser,
but also enables stable training and early convergence on a challenging
large-scale synthetic dataset, initially proposed for text detection in real
scenes. What's more, MTRNet achieves state-of-the-art results on several
real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without
being explicitly trained on this data, outperforming previous state-of-the-art
methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc
Robust Iris Segmentation Based on Fully Convolutional Networks and Generative Adversarial Networks
The iris can be considered as one of the most important biometric traits due
to its high degree of uniqueness. Iris-based biometrics applications depend
mainly on the iris segmentation whose suitability is not robust for different
environments such as near-infrared (NIR) and visible (VIS) ones. In this paper,
two approaches for robust iris segmentation based on Fully Convolutional
Networks (FCNs) and Generative Adversarial Networks (GANs) are described.
Similar to a common convolutional network, but without the fully connected
layers (i.e., the classification layers), an FCN employs at its end a
combination of pooling layers from different convolutional layers. Based on the
game theory, a GAN is designed as two networks competing with each other to
generate the best segmentation. The proposed segmentation networks achieved
promising results in all evaluated datasets (i.e., BioSec, CasiaI3, CasiaT4,
IITD-1) of NIR images and (NICE.I, CrEye-Iris and MICHE-I) of VIS images in
both non-cooperative and cooperative domains, outperforming the baselines
techniques which are the best ones found so far in the literature, i.e., a new
state of the art for these datasets. Furthermore, we manually labeled 2,431
images from CasiaT4, CrEye-Iris and MICHE-I datasets, making the masks
available for research purposes.Comment: Accepted for presentation at the Conference on Graphics, Patterns and
Images (SIBGRAPI) 201
- …