15 research outputs found

    Spatio-Temporal Deformable Attention Network for Video Deblurring

    Full text link
    The key success factor of the video deblurring methods is to compensate for the blurry pixels of the mid-frame with the sharp pixels of the adjacent video frames. Therefore, mainstream methods align the adjacent frames based on the estimated optical flows and fuse the alignment frames for restoration. However, these methods sometimes generate unsatisfactory results because they rarely consider the blur levels of pixels, which may introduce blurry pixels from video frames. Actually, not all the pixels in the video frames are sharp and beneficial for deblurring. To address this problem, we propose the spatio-temporal deformable attention network (STDANet) for video delurring, which extracts the information of sharp pixels by considering the pixel-wise blur levels of the video frames. Specifically, STDANet is an encoder-decoder network combined with the motion estimator and spatio-temporal deformable attention (STDA) module, where motion estimator predicts coarse optical flows that are used as base offsets to find the corresponding sharp pixels in STDA module. Experimental results indicate that the proposed STDANet performs favorably against state-of-the-art methods on the GoPro, DVD, and BSD datasets.Comment: ECCV 202

    Forensic image analysis – CCTV distortion and artefacts

    Get PDF
    © 2018 Elsevier B.V. As a result of the worldwide deployment of surveillance cameras, authorities have gained a powerful tool that captures footage of activities of people in public areas. Surveillance cameras allow continuous monitoring of the area and allow footage to be obtained for later use, if a criminal or other act of interest occurs. Following this, a forensic practitioner, or expert witness can be required to analyse the footage of the Person of Interest. The examination ultimately aims at evaluating the strength of evidence at source and activity levels. In this paper, both source and activity levels are inferred from the trace, obtained in the form of CCTV footage. The source level alludes to features observed within the anatomy and gait of an individual, whilst the activity level relates to activity undertaken by the individual within the footage. The strength of evidence depends on the value of the information recorded, where the activity level is robust, yet source level requires further development. It is therefore suggested that the camera and the associated distortions should be assessed first and foremost and, where possible, quantified, to determine the level of each type of distortion present within the footage. A review of the ‘forensic image analysis’ review is presented here. It will outline the image distortion types and detail the limitations of differing surveillance camera systems. The aim is to highlight various types of distortion present particularly from surveillance footage, as well as address gaps in current literature in relation to assessment of CCTV distortions in tandem with gait analysis. Future work will consider the anatomical assessment from surveillance footage

    Visual tracking under motion blur

    Get PDF
    Most existing tracking algorithms do not explicitly consider the motion blur contained in video sequences, which degrades their performance in real-world applications where motion blur often occurs. In this paper, we propose to solve the motion blur problem in visual tracking in a unified framework. Specifically, a joint blur state estimation and multi-task reverse sparse learning framework are presented, where the closed-form solution of blur kernel and sparse code matrix is obtained simultaneously. The reverse process considers the blurry candidates as dictionary elements, and sparsely represents blurred templates with the candidates. By utilizing the information contained in the sparse code matrix, an efficient likelihood model is further developed, which quickly excludes irrelevant candidates and narrows the particle scale down. Experimental results on the challenging benchmarks show that our method performs well against the state-of-the-art trackers

    Retina-Based Pipe-Like Object Tracking Implemented Through Spiking Neural Network on a Snake Robot

    Get PDF
    Vision based-target tracking ability is crucial to bio-inspired snake robots for exploring unknown environments. However, it is difficult for the traditional vision modules of snake robots to overcome the image blur resulting from periodic swings. A promising approach is to use a neuromorphic vision sensor (NVS), which mimics the biological retina to detect a target at a higher temporal frequency and in a wider dynamic range. In this study, an NVS and a spiking neural network (SNN) were performed on a snake robot for the first time to achieve pipe-like object tracking. An SNN based on Hough Transform was designed to detect a target with an asynchronous event stream fed by the NVS. Combining the state of snake motion analyzed by the joint position sensors, a tracking framework was proposed. The experimental results obtained from the simulator demonstrated the validity of our framework and the autonomous locomotion ability of our snake robot. Comparing the performances of the SNN model on CPUs and on GPUs, respectively, the SNN model showed the best performance on a GPU under a simplified and synchronous update rule while it possessed higher precision on a CPU in an asynchronous way

    통합시스템을이용한 다시점스테레오 매칭과영상복원

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 이경무.Estimating camera pose and scene structures from seriously degraded images is challenging problem. Most existing multi-view stereo algorithms assume high-quality input images and therefore have unreliable results for blurred, noisy, or low-resolution images. Experimental results show that the approach of using off-the-shelf image reconstruction algorithms as independent preprocessing is generally ineffective or even sometimes counterproductive. This is because naive frame-wise image reconstruction methods fundamentally ignore the consistency between images, although they seem to produce visually plausible results. In this thesis, from the fact that image reconstruction and multi-view stereo problems are interrelated, we present a unified framework to solve these problems jointly. The validity of this approach is empirically verified for four different problems, dense depth map reconstruction, camera pose estimation, super-resolution, and deblurring from images obtained by a single moving camera. By reflecting the physical imaging process, we cast our objective into a cost minimization problem, and solve the solution using alternate optimization techniques. Experiments show that the proposed method can restore high-quality depth maps from seriously degraded images for both synthetic and real video, as opposed to the failure of simple multi-view stereo methods. Our algorithm also produces superior super-resolution and deblurring results compared to simple preprocessing with conventional super-resolution and deblurring techniques. Moreover, we show that the proposed framework can be generalized to handle more common scenarios. First, it can solve image reconstruction and multi-view stereo problems for multi-view single-shot images captured by a light field camera. By using information of calibrated multi-view images, it recovers the motions of individual objects in the input image as well as the unknown camera motion during the shutter time. The contribution of this thesis is proposing a new perspective on the solution of the existing computer vision problems from an integrated viewpoint. We show that by solving interrelated problems jointly, we can obtain physically more plausible solution and better performance, especially when input images are challenging. The proposed optimization algorithm also makes our algorithm more practical in terms of computational complexity.1 Introduction 1 1.1 Outline of Dissertation 2 2 Background 5 3 Generalized Imaging Model 9 3.1 Camera Projection Model 9 3.2 Depth and Warping Operation 11 3.3 Representation of Camera Pose in SE(3) 12 3.4 Proposed Imaging Model 12 4 Rendering Synthetic Datasets 17 4.1 Making Blurred Image Sequences using Depth-based Image Rendering 18 4.2 Making Blurred Image Sequences using Blender 18 5 A Unified Framework for Single-shot Multi-view Images 21 5.1 Introduction 21 5.2 Related Works 24 5.3 Deblurring with 4D Light Fields 27 5.3.1 Motion Blur Formulation in Light Fields 27 5.3.2 Initialization 28 5.4 Joint Estimation 30 5.4.1 Energy Formulation 30 5.4.2 Update Latent Image 31 5.4.3 Update Camera Pose and Depth map 33 5.5 Experimental Results 34 5.5.1 Synthetic Data 34 5.5.2 Real Data 36 5.6 Conclusion 37 6 A Unified Framework for a Monocular Image Sequence 41 6.1 Introduction 41 6.2 Related Works 44 6.3 Modeling Imaging Process 46 6.4 Unified Energy Formulation 47 6.4.1 Matching term 47 6.4.2 Self-consistency term 48 6.4.3 Regularization term 49 6.5 Optimization 50 6.5.1 Update of the depth maps and camera poses 51 6.5.2 Update of the latent images . 52 6.5.3 Initialization 53 6.5.4 Occlusion Handling 54 6.6 Experimental Results 54 6.6.1 Synthetic datasets 55 6.6.2 Real datasets 61 6.6.3 The effect of parameters 65 6.7 Conclusion 66 7 A Unified Framework for SLAM 69 7.1 Motivation 69 7.2 Baseline 70 7.3 Proposed Method 72 7.4 Experimental Results 73 7.4.1 Quantitative comparison 73 7.4.2 Qualitative results 77 7.4.3 Runtime 79 7.5 Conclusion 80 8 Conclusion 83 8.1 Summary and Contribution of the Dissertation 83 8.2 Future Works 84 Bibliography 86 초록 94Docto

    Visual odometry using floor texture

    Get PDF
    Orientador: Paulo Roberto Gardel KurkaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia MecânicaResumo: Esta tese apresenta um novo método utilizando técnicas de visão computacional e aprendizado de máquina para estimar a odometria de um robô utilizando apenas informações da textura do chão. Foram testados 7 diferentes detectores de pontos característicos na imagem, um algoritmo de rastreamento de pontos e uma estrutura de rede neural com três diferentes algoritmos de treinamento de forma a determinar a metodologia com maior precisão e menor tempo de processamento. O processo inicia com a extração de pontos característicos na imagem através do detector de pontos característicos FAST, juntamente com o rastreamento destes através do algoritmo de fluxo óptico de Lucas-Kanade-Tomasi e então a odometria é calculada de três maneiras: algebricamente, algebricamente com melhoria por redes neurais e utilizando apenas redes neurais. Os experimentos foram realizados em um ambiente virtual desenvolvido na plataforma de software Blender com implementação de diferentes texturas do chão e em um ambiente real. Os resultados mostram que a metodologia proposta é promissora para aplicação em tempo real para certas texturas, considerando uma configuração de hardware apropriadaAbstract: This thesis presents the implementation of computer vision and machine learning techniques to estimate the odometry of a differential robot using only floor texture information. Seven feature detectors, one feature tracker and one neural network structure with three different training algorithms were tested to determine the fastest and most precise methodology. The process works firstly by extracting characteristic points in the floor image using FAST feature detector, then these points are tracked with the Lucas-Kanade-Tomasi optical flow and the odometry is calculated in three different ways: algebraic, algebraic with a neural network improver and neural network only. The experiments were done in a virtual environment developed in the Blender software platform with different floor textures and in a real environment. The results show that the methodology presented is promising for real time applications and certain textures, considering an appropriate hardware configurationDoutoradoMecatrônicaDoutor em Engenharia MecânicaFAPEA
    corecore