Search CORE

54,892 research outputs found

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Author: Choi Hyunsoo
Jiang Yingying
Kim Sungjin
Liu Cheng-Lin
Luo Zhenbo
Wang Xiaobing
Publication venue
Publication date: 15/05/2019
Field of study

Scene text detection attracts much attention in computer vision, because it can be widely used in many applications such as real-time text translation, automatic information entry, blind person assistance, robot sensing and so on. Though many methods have been proposed for horizontal and oriented texts, detecting irregular shape texts such as curved texts is still a challenging problem. To solve the problem, we propose a robust scene text detection method with adaptive text region representation. Given an input image, a text region proposal network is first used for extracting text proposals. Then, these proposals are verified and refined with a refinement network. Here, recurrent neural network based adaptive text region representation is proposed for text region refinement, where a pair of boundary points are predicted each time step until no new points are found. In this way, text regions of arbitrary shapes are detected and represented with adaptive number of boundary points. This gives more accurate description of text regions. Experimental results on five benchmarks, namely, CTW1500, TotalText, ICDAR2013, ICDAR2015 and MSRATD500, show that the proposed method achieves state-of-the-art in scene text detection

arXiv.org e-Print Archive

Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

Author: Deng Ruizhi
Li Bo
Liu Mingyan
Song Dawn
Xiao Chaowei
Yu Fisher
Publication venue
Publication date: 11/10/2018
Field of study

Deep Neural Networks (DNNs) have been widely applied in various recognition tasks. However, recently DNNs have been shown to be vulnerable against adversarial examples, which can mislead DNNs to make arbitrary incorrect predictions. While adversarial examples are well studied in classification tasks, other learning problems may have different properties. For instance, semantic segmentation requires additional components such as dilated convolutions and multiscale processing. In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation. We observe that spatial consistency information can be potentially leveraged to detect adversarial examples robustly even when a strong adaptive attacker has access to the model and detection strategies. We also show that adversarial examples based on attacks considered within the paper barely transfer among models, even though transferability is common in classification. Our observations shed new light on developing adversarial attacks and defenses to better understand the vulnerabilities of DNNs.Comment: Accepted to ECCV 201

arXiv.org e-Print Archive

Single Image Dehazing through Improved Atmospheric Light Estimation

Author: Li Yujie
Lu Huimin
Nakashima Shota
Serikawa Seiichi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/10/2015
Field of study

Image contrast enhancement for outdoor vision is important for smart car auxiliary transport systems. The video frames captured in poor weather conditions are often characterized by poor visibility. Most image dehazing algorithms consider to use a hard threshold assumptions or user input to estimate atmospheric light. However, the brightest pixels sometimes are objects such as car lights or streetlights, especially for smart car auxiliary transport systems. Simply using a hard threshold may cause a wrong estimation. In this paper, we propose a single optimized image dehazing method that estimates atmospheric light efficiently and removes haze through the estimation of a semi-globally adaptive filter. The enhanced images are characterized with little noise and good exposure in dark regions. The textures and edges of the processed images are also enhanced significantly.Comment: Multimedia Tools and Applications (2015

arXiv.org e-Print Archive

Background Subtraction in Real Applications: Challenges, Current Models and Future Directions

Author: Bouwmans T.
Garcia-Garcia B.
Publication venue
Publication date: 11/01/2019
Field of study

Computer vision applications based on videos often require the detection of moving objects in their first step. Background subtraction is then applied in order to separate the background and the foreground. In literature, background subtraction is surely among the most investigated field in computer vision providing a big amount of publications. Most of them concern the application of mathematical and machine learning models to be more robust to the challenges met in videos. However, the ultimate goal is that the background subtraction methods developed in research could be employed in real applications like traffic surveillance. But looking at the literature, we can remark that there is often a gap between the current methods used in real applications and the current methods in fundamental research. In addition, the videos evaluated in large-scale datasets are not exhaustive in the way that they only covered a part of the complete spectrum of the challenges met in real applications. In this context, we attempt to provide the most exhaustive survey as possible on real applications that used background subtraction in order to identify the real challenges met in practice, the current used background models and to provide future directions. Thus, challenges are investigated in terms of camera, foreground objects and environments. In addition, we identify the background models that are effectively used in these applications in order to find potential usable recent background models in terms of robustness, time and memory requirements.Comment: Submitted to Computer Science Revie

arXiv.org e-Print Archive

Skeleton Matching based approach for Text Localization in Scene Images

Author: L Smitha M.
Shekar B. H.
Publication venue
Publication date: 13/02/2015
Field of study

In this paper, we propose a skeleton matching based approach which aids in text localization in scene images. The input image is preprocessed and segmented into blocks using connected component analysis. We obtain the skeleton of the segmented block using morphology based approach. The skeletonized images are compared with the trained templates in the database to categorize into text and non-text blocks. Further, the newly designed geometrical rules and morphological operations are employed on the detected text blocks for scene text localization. The experimental results obtained on publicly available standard datasets illustrate that the proposed method can detect and localize the texts of various sizes, fonts and colors.Comment: 10 pages, 8 figures, Eighth International Conference on Image and Signal Processing,Elsevier Publications,pp: 145-153, held at UVCE, Bangalore in July 2014. ISBN: 978935107252

arXiv.org e-Print Archive

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition

Author: Guo Chunchao
Li Zejun
Wang Hongfa
Wu Jianwei
Xiao Lei
Yang Chun
Yin Xu-Cheng
Publication venue
Publication date: 10/10/2017
Field of study

Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks). In the end-to-end training procedure for scene text recognition, the outputs of deep neural networks at different iterations are always demonstrated with diversity and complementarity for the target object (text). Here, a simple but effective deep learning method, an adaptive ensemble of deep neural networks (AdaDNNs), is proposed to simply select and adaptively combine classifier components at different iterations from the whole learning system. Furthermore, the ensemble is formulated as a Bayesian framework for classifier weighting and combination. A variety of experiments on several typical acknowledged benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4) datasets, verify the surprised improvement from the baseline DNNs, and the effectiveness of AdaDNNs compared with the recent state-of-the-art methods

arXiv.org e-Print Archive

Compressive Hyperspectral Imaging via Approximate Message Passing

Author: Arce Gonzalo
Baron Dror
Ma Yanting
Rueda Hoover
Tan Jin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2015
Field of study

We consider a compressive hyperspectral imaging reconstruction problem, where three-dimensional spatio-spectral information about a scene is sensed by a coded aperture snapshot spectral imager (CASSI). The CASSI imaging process can be modeled as suppressing three-dimensional coded and shifted voxels and projecting these onto a two-dimensional plane, such that the number of acquired measurements is greatly reduced. On the other hand, because the measurements are highly compressive, the reconstruction process becomes challenging. We previously proposed a compressive imaging reconstruction algorithm that is applied to two-dimensional images based on the approximate message passing (AMP) framework. AMP is an iterative algorithm that can be used in signal and image reconstruction by performing denoising at each iteration. We employed an adaptive Wiener filter as the image denoiser, and called our algorithm "AMP-Wiener." In this paper, we extend AMP-Wiener to three-dimensional hyperspectral image reconstruction, and call it "AMP-3D-Wiener." Applying the AMP framework to the CASSI system is challenging, because the matrix that models the CASSI system is highly sparse, and such a matrix is not suitable to AMP and makes it difficult for AMP to converge. Therefore, we modify the adaptive Wiener filter and employ a technique called damping to solve for the divergence issue of AMP. Our approach is applied in nature, and the numerical experiments show that AMP-3D-Wiener outperforms existing widely-used algorithms such as gradient projection for sparse reconstruction (GPSR) and two-step iterative shrinkage/thresholding (TwIST) given a similar amount of runtime. Moreover, in contrast to GPSR and TwIST, AMP-3D-Wiener need not tune any parameters, which simplifies the reconstruction process.Comment: 13 pages, to appear in Journal of Selected Topics in Signal Processin

arXiv.org e-Print Archive

Hierarchy Composition GAN for High-fidelity Image Synthesis

Author: Huang Jiaxing
Lu Shijian
Zhan Fangneng
Publication venue
Publication date: 09/01/2021
Field of study

Despite the rapid progress of generative adversarial networks (GANs) in image synthesis in recent years, the existing image synthesis approaches work in either geometry domain or appearance domain alone which often introduces various synthesis artifacts. This paper presents an innovative Hierarchical Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and appearance domains into an end-to-end trainable network and achieves superior synthesis realism in both domains simultaneously. We design an innovative hierarchical composition mechanism that is capable of learning realistic composition geometry and handling occlusions while multiple foreground objects are involved in image composition. In addition, we introduce a novel attention mask mechanism that guides to adapt the appearance of foreground objects which also helps to provide better training reference for learning in geometry domain. Extensive experiments on scene text image synthesis, portrait editing and indoor rendering tasks show that the proposed HIC-GAN achieves superior synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure

arXiv.org e-Print Archive

Audio-visual foreground extraction for event characterization

Author: Bicego Manuele
Cristani Marco
Murino Vittorio
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities

Spatio-Temporal Filter Adaptive Network for Video Deblurring

Author: Pan Jinshan
Ren Jimmy
Xie Haozhe
Zhang Jiawei
Zhou Shangchen
Zuo Wangmeng
Publication venue
Publication date: 01/08/2019
Field of study

Video deblurring is a challenging task due to the spatially variant blur caused by camera shake, object motions, and depth variations, etc. Existing methods usually estimate optical flow in the blurry video to align consecutive frames or approximate blur kernels. However, they tend to generate artifacts or cannot effectively remove blur when the estimated optical flow is not accurate. To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework. The proposed STFAN takes both blurry and restored images of the previous frame as well as blurry image of the current frame as input, and dynamically generates the spatially adaptive filters for the alignment and deblurring. We then propose the new Filter Adaptive Convolutional (FAC) layer to align the deblurred features of the previous frame with the current frame and remove the spatially variant blur from the features of the current frame. Finally, we develop a reconstruction network which takes the fusion of two transformed features to restore the clear frames. Both quantitative and qualitative evaluation results on the benchmark datasets and real-world videos demonstrate that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy, speed as well as model size.Comment: ICCV 201

arXiv.org e-Print Archive