Search CORE

9 research outputs found

Scene Text Eraser

Author: Nakamura Toshiki
Uchida Seiichi
Yanai Keiji
Zhu Anna
Publication venue
Publication date: 08/05/2017
Field of study

The character information in natural scene images contains various personal information, such as telephone numbers, home addresses, etc. It is a high risk of leakage the information if they are published. In this paper, we proposed a scene text erasing method to properly hide the information via an inpainting convolutional neural network (CNN) model. The input is a scene text image, and the output is expected to be text erased image with all the character regions filled up the colors of the surrounding background pixels. This work is accomplished by a CNN model through convolution to deconvolution with interconnection process. The training samples and the corresponding inpainting images are considered as teaching signals for training. To evaluate the text erasing performance, the output images are detected by a novel scene text detection method. Subsequently, the same measurement on text detection is utilized for testing the images in benchmark dataset ICDAR2013. Compared with direct text detection way, the scene text erasing process demonstrates a drastically decrease on the precision, recall and f-score. That proves the effectiveness of proposed method for erasing the text in natural scene images

arXiv.org e-Print Archive

Crossref

MTRNet: A Generic Scene Text Eraser

Author: Denman Simon
Fookes Clinton
Sivapalan Sabesan
Sridharan Sridha
Tursun Osman
Zeng Rui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2019
Field of study

Text removal algorithms have been proposed for uni-lingual scripts with regular shapes and layouts. However, to the best of our knowledge, a generic text removal method which is able to remove all or user-specified text regions regardless of font, script, language or shape is not available. Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting. To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet is a conditional adversarial generative network (cGAN) with an auxiliary mask. The introduced auxiliary mask not only makes the cGAN a generic text eraser, but also enables stable training and early convergence on a challenging large-scale synthetic dataset, initially proposed for text detection in real scenes. What's more, MTRNet achieves state-of-the-art results on several real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without being explicitly trained on this data, outperforming previous state-of-the-art methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

Author: Cuzzolin Fabio
Saha Suman
Sapienza Michael
Singh Gurkirt
Torr Philip H. S.
Publication venue
Publication date: 01/01/2016
Field of study

In this work, we propose an approach to the spatiotemporal localisation (detection) and classification of multiple concurrent actions within temporally untrimmed videos. Our framework is composed of three stages. In stage 1, appearance and motion detection networks are employed to localise and score actions from colour images and optical flow. In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap. In stage 3, sequences of detection boxes most likely to be associated with a single action instance, called action tubes, are constructed by solving two energy maximisation problems via dynamic programming. While in the first pass, action paths spanning the whole video are built by linking detection boxes over time using their class-specific scores and their spatial overlap, in the second pass, temporal trimming is performed by ensuring label consistency for all constituting detection boxes. We demonstrate the performance of our algorithm on the challenging UCF101, J-HMDB-21 and LIRIS-HARL datasets, achieving new state-of-the-art results across the board and significantly increasing detection speed at test time. We achieve a huge leap forward in action detection performance and report a 20% and 11% gain in mAP (mean average precision) on UCF-101 and J-HMDB-21 datasets respectively when compared to the state-of-the-art.Comment: Accepted by British Machine Vision Conference 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Oxford Brookes University: RADAR

MTRNet++: One-stage Mask-based Scene Text Eraser

Author: Denman Simon
Fookes Clinton
Sivapalan Sabesan
Sridharan Sridha
Tursun Osman
Zeng Rui
Publication venue: 'Elsevier BV'
Publication date: 04/06/2020
Field of study

A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.Comment: This paper is under CVIU review (after major revision

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Deep learning for detecting multiple space-time action tubes in videos

Author: Cuzzolin Fabio
Saha Suman
Sapienza Michael
Singh Gurkirt
Torr Philip
Publication venue
Publication date: 01/01/2016
Field of study

Oxford Brookes University: RADAR

RGB-D-based Action Recognition Datasets: A Survey

Author: Li Wanqing
Ogunbona Philip O.
Tang Chang
Wang Pichao
Zhang Jing
Publication venue
Publication date: 01/01/2016
Field of study

Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

arXiv.org e-Print Archive

Crossref

Research Online

Evaluation of video activity localizations integrating quality and quantity measurements

Author: Baccouche Moez
Bichot Charles-Edmond
celiktutan Oya
Dellandréa Emmanuel
Dogan Emre
Eren Gonen
Garcia Christophe
Jiu Mingyuan
Lombardi Eric
Mille Julien
Sankur Bülent
Wolf Christian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceEvaluating the performance of computer vision algorithms is classically done by reporting classification error or accuracy, if the problem at hand is the classification of an object in an image, the recognition of an activity in a video or the categorization and labeling of the image or video. If in addition the detection of an item in an image or a video, and/or its localization are required, frequently used metrics are Recall and Precision, as well as ROC curves. These metrics give quantitative performance values which are easy to understand and to interpret even by non-experts. However, an inherent problem is the dependency of quantitative performance measures on the quality constraints that we need impose on the detection algorithm. In particular, an important quality parameter of these measures is the spatial or spatio-temporal overlap between a ground-truth item and a detected item, and this needs to be taken into account when interpreting the results. We propose a new performance metric addressing and unifying the qualitative and quantitative aspects of the performance measures. The performance of a detection and recognition algorithm is illustrated intuitively by performance graphs which present quantitative performance values, like Recall, Precision and F-Score, depending on quality constraints of the detection. In order to compare the performance of different computer vision algorithms, a representative single performance measure is computed from the graphs, by integrating out all quality parameters. The evaluation method can be applied to different types of activity detection and recognition algorithms. The performance metric has been tested on several activity recognition algorithms participating in the ICPR 2012 HARL competition

Evaluation of video activity localizations integrating quality and quantity measurements

Author: Baccouche Moez
Bichot Charles-Edmond
Celiktutan Oya
Dellandréa Emmanuel
Dogan Emre
Eren Gonen
Garcia Christophe
Jiu Mingyuan
Lombardi Eric
Mille Julien
Sankur Bülent
Wolf Christian
Publication venue: Elsevier
Publication date: 01/01/2014
Field of study

HAL

Evaluation of video activity localizations integrating quality and quantity measurements

Author: Bülent Sankur
Chaquet
Charles-Edmond Bichot
Christian Wolf
Christophe Garcia
Emmanuel Dellandréa
Emre Dogan
Eric Lombardi
Felzenszwalb
Gonen Eren
Julien Mille
Kasturi
Kliper-Gross
Mingyuan Jiu
Minnen
Moez Baccouche
Mostefa
Mukherjee
Ni
Oya Celiktutan
van Rijsbergen
Ward
Ward
Wolf
Xu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref