54,892 research outputs found

    Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

    Full text link
    Scene text detection attracts much attention in computer vision, because it can be widely used in many applications such as real-time text translation, automatic information entry, blind person assistance, robot sensing and so on. Though many methods have been proposed for horizontal and oriented texts, detecting irregular shape texts such as curved texts is still a challenging problem. To solve the problem, we propose a robust scene text detection method with adaptive text region representation. Given an input image, a text region proposal network is first used for extracting text proposals. Then, these proposals are verified and refined with a refinement network. Here, recurrent neural network based adaptive text region representation is proposed for text region refinement, where a pair of boundary points are predicted each time step until no new points are found. In this way, text regions of arbitrary shapes are detected and represented with adaptive number of boundary points. This gives more accurate description of text regions. Experimental results on five benchmarks, namely, CTW1500, TotalText, ICDAR2013, ICDAR2015 and MSRATD500, show that the proposed method achieves state-of-the-art in scene text detection

    Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

    Full text link
    Deep Neural Networks (DNNs) have been widely applied in various recognition tasks. However, recently DNNs have been shown to be vulnerable against adversarial examples, which can mislead DNNs to make arbitrary incorrect predictions. While adversarial examples are well studied in classification tasks, other learning problems may have different properties. For instance, semantic segmentation requires additional components such as dilated convolutions and multiscale processing. In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation. We observe that spatial consistency information can be potentially leveraged to detect adversarial examples robustly even when a strong adaptive attacker has access to the model and detection strategies. We also show that adversarial examples based on attacks considered within the paper barely transfer among models, even though transferability is common in classification. Our observations shed new light on developing adversarial attacks and defenses to better understand the vulnerabilities of DNNs.Comment: Accepted to ECCV 201

    Single Image Dehazing through Improved Atmospheric Light Estimation

    Full text link
    Image contrast enhancement for outdoor vision is important for smart car auxiliary transport systems. The video frames captured in poor weather conditions are often characterized by poor visibility. Most image dehazing algorithms consider to use a hard threshold assumptions or user input to estimate atmospheric light. However, the brightest pixels sometimes are objects such as car lights or streetlights, especially for smart car auxiliary transport systems. Simply using a hard threshold may cause a wrong estimation. In this paper, we propose a single optimized image dehazing method that estimates atmospheric light efficiently and removes haze through the estimation of a semi-globally adaptive filter. The enhanced images are characterized with little noise and good exposure in dark regions. The textures and edges of the processed images are also enhanced significantly.Comment: Multimedia Tools and Applications (2015

    Background Subtraction in Real Applications: Challenges, Current Models and Future Directions

    Full text link
    Computer vision applications based on videos often require the detection of moving objects in their first step. Background subtraction is then applied in order to separate the background and the foreground. In literature, background subtraction is surely among the most investigated field in computer vision providing a big amount of publications. Most of them concern the application of mathematical and machine learning models to be more robust to the challenges met in videos. However, the ultimate goal is that the background subtraction methods developed in research could be employed in real applications like traffic surveillance. But looking at the literature, we can remark that there is often a gap between the current methods used in real applications and the current methods in fundamental research. In addition, the videos evaluated in large-scale datasets are not exhaustive in the way that they only covered a part of the complete spectrum of the challenges met in real applications. In this context, we attempt to provide the most exhaustive survey as possible on real applications that used background subtraction in order to identify the real challenges met in practice, the current used background models and to provide future directions. Thus, challenges are investigated in terms of camera, foreground objects and environments. In addition, we identify the background models that are effectively used in these applications in order to find potential usable recent background models in terms of robustness, time and memory requirements.Comment: Submitted to Computer Science Revie

    Skeleton Matching based approach for Text Localization in Scene Images

    Full text link
    In this paper, we propose a skeleton matching based approach which aids in text localization in scene images. The input image is preprocessed and segmented into blocks using connected component analysis. We obtain the skeleton of the segmented block using morphology based approach. The skeletonized images are compared with the trained templates in the database to categorize into text and non-text blocks. Further, the newly designed geometrical rules and morphological operations are employed on the detected text blocks for scene text localization. The experimental results obtained on publicly available standard datasets illustrate that the proposed method can detect and localize the texts of various sizes, fonts and colors.Comment: 10 pages, 8 figures, Eighth International Conference on Image and Signal Processing,Elsevier Publications,pp: 145-153, held at UVCE, Bangalore in July 2014. ISBN: 978935107252

    AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition

    Full text link
    Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks). In the end-to-end training procedure for scene text recognition, the outputs of deep neural networks at different iterations are always demonstrated with diversity and complementarity for the target object (text). Here, a simple but effective deep learning method, an adaptive ensemble of deep neural networks (AdaDNNs), is proposed to simply select and adaptively combine classifier components at different iterations from the whole learning system. Furthermore, the ensemble is formulated as a Bayesian framework for classifier weighting and combination. A variety of experiments on several typical acknowledged benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4) datasets, verify the surprised improvement from the baseline DNNs, and the effectiveness of AdaDNNs compared with the recent state-of-the-art methods

    Compressive Hyperspectral Imaging via Approximate Message Passing

    Full text link
    We consider a compressive hyperspectral imaging reconstruction problem, where three-dimensional spatio-spectral information about a scene is sensed by a coded aperture snapshot spectral imager (CASSI). The CASSI imaging process can be modeled as suppressing three-dimensional coded and shifted voxels and projecting these onto a two-dimensional plane, such that the number of acquired measurements is greatly reduced. On the other hand, because the measurements are highly compressive, the reconstruction process becomes challenging. We previously proposed a compressive imaging reconstruction algorithm that is applied to two-dimensional images based on the approximate message passing (AMP) framework. AMP is an iterative algorithm that can be used in signal and image reconstruction by performing denoising at each iteration. We employed an adaptive Wiener filter as the image denoiser, and called our algorithm "AMP-Wiener." In this paper, we extend AMP-Wiener to three-dimensional hyperspectral image reconstruction, and call it "AMP-3D-Wiener." Applying the AMP framework to the CASSI system is challenging, because the matrix that models the CASSI system is highly sparse, and such a matrix is not suitable to AMP and makes it difficult for AMP to converge. Therefore, we modify the adaptive Wiener filter and employ a technique called damping to solve for the divergence issue of AMP. Our approach is applied in nature, and the numerical experiments show that AMP-3D-Wiener outperforms existing widely-used algorithms such as gradient projection for sparse reconstruction (GPSR) and two-step iterative shrinkage/thresholding (TwIST) given a similar amount of runtime. Moreover, in contrast to GPSR and TwIST, AMP-3D-Wiener need not tune any parameters, which simplifies the reconstruction process.Comment: 13 pages, to appear in Journal of Selected Topics in Signal Processin

    Hierarchy Composition GAN for High-fidelity Image Synthesis

    Full text link
    Despite the rapid progress of generative adversarial networks (GANs) in image synthesis in recent years, the existing image synthesis approaches work in either geometry domain or appearance domain alone which often introduces various synthesis artifacts. This paper presents an innovative Hierarchical Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and appearance domains into an end-to-end trainable network and achieves superior synthesis realism in both domains simultaneously. We design an innovative hierarchical composition mechanism that is capable of learning realistic composition geometry and handling occlusions while multiple foreground objects are involved in image composition. In addition, we introduce a novel attention mask mechanism that guides to adapt the appearance of foreground objects which also helps to provide better training reference for learning in geometry domain. Extensive experiments on scene text image synthesis, portrait editing and indoor rendering tasks show that the proposed HIC-GAN achieves superior synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure

    Audio-visual foreground extraction for event characterization

    Get PDF
    This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities

    Spatio-Temporal Filter Adaptive Network for Video Deblurring

    Full text link
    Video deblurring is a challenging task due to the spatially variant blur caused by camera shake, object motions, and depth variations, etc. Existing methods usually estimate optical flow in the blurry video to align consecutive frames or approximate blur kernels. However, they tend to generate artifacts or cannot effectively remove blur when the estimated optical flow is not accurate. To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework. The proposed STFAN takes both blurry and restored images of the previous frame as well as blurry image of the current frame as input, and dynamically generates the spatially adaptive filters for the alignment and deblurring. We then propose the new Filter Adaptive Convolutional (FAC) layer to align the deblurred features of the previous frame with the current frame and remove the spatially variant blur from the features of the current frame. Finally, we develop a reconstruction network which takes the fusion of two transformed features to restore the clear frames. Both quantitative and qualitative evaluation results on the benchmark datasets and real-world videos demonstrate that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy, speed as well as model size.Comment: ICCV 201
    corecore