Search CORE

91,053 research outputs found

MULTI-FRAME OPTICAL FLOW ESTIMATION USING SPATIO-TEMPORAL TRANSFORMERS

Author: Ferede Fisseha Admasu
Publication venue: University of Memphis Digital Commons
Publication date: 18/01/2023
Field of study

Optical flow estimation is a computer vision problem which aims to estimate apparent 2D motion (flow velocities) of image intensities between two or more consecutive frames in an image sequence. Optical flow information is useful for quantifying dense motion field in numerous applications such as autonomous driving, object tracking in traffic control systems, video frame interpolation, video compression and structural biomarker development for medical diagnosis. Recent state of the art learning methods for optical flow estimation are two-frame based methods where optical flow is estimated sequentially for each image pairs in an image sequence. In this work, we introduce a learning based spatio-temporal transformers for multi-frame optical flow estimation (SSTMs). SSTM is a multi-frame based optical flow estimation algorithm which can learn and estimate non-linear motion dynamics in a scene from multiple sequential images of the scene. When compared to two-frame methods, SSTM can provide improved optical flow estimates in regions with object occlusions and near boundaries where objects may enter or leave the scene (out-of-boundary regions). Our method utilizes 3D Convolutional Gated Recurrent Networks (3D-ConvGRUs) and space-time attention modules to learn the recurrent space-time dynamics of input scenes and provide a generalized optical flow estimation. When trained using the same training datasets, our method outperforms both the existing multi-frame based optical flow estimation algorithms and the recent state of the art two-frame methods on Sintel benchmark dataset (based on a computer-animated movie) and KITTI 2015 driving benchmark datasets

University of Memphis Digital Commons

MonoPerfCap: Human Performance Capture from Monocular Video

Author: Chatterjee Avishek
Mehta Dushyant
Rhodin Helge
Seidel Hans-Peter
Theobalt Christian
Xu Weipeng
Zollhöfer Michael
Publication venue
Publication date: 01/01/2018
Field of study

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

MPG.PuRe

Homography-based ground plane detection using a single on-board camera

Author: Arróspide
Bertozzi
Blanco
Cao
Chumerin
Criminisi
Ess
Gavrila
J. Arróspide
Koller
Koller
L. Salgado
Lowe
M. Nieto
Nieto
R. Mohedano
Ridder
Simond
Stein
Yamaguchi
Zhou
Zhou
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2010
Field of study

This study presents a robust method for ground plane detection in vision-based systems with a non-stationary camera. The proposed method is based on the reliable estimation of the homography between ground planes in successive images. This homography is computed using a feature matching approach, which in contrast to classical approaches to on-board motion estimation does not require explicit ego-motion calculation. As opposed to it, a novel homography calculation method based on a linear estimation framework is presented. This framework provides predictions of the ground plane transformation matrix that are dynamically updated with new measurements. The method is specially suited for challenging environments, in particular traffic scenarios, in which the information is scarce and the homography computed from the images is usually inaccurate or erroneous. The proposed estimation framework is able to remove erroneous measurements and to correct those that are inaccurate, hence producing a reliable homography estimate at each instant. It is based on the evaluation of the difference between the predicted and the observed transformations, measured according to the spectral norm of the associated matrix of differences. Moreover, an example is provided on how to use the information extracted from ground plane estimation to achieve object detection and tracking. The method has been successfully demonstrated for the detection of moving vehicles in traffic environments

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

The World of Fast Moving Objects

Author: Kotera Jan
Matas Jiri
Novotny Lukas
Rozumnyi Denys
Sroubek Filip
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/11/2016
Field of study

The notion of a Fast Moving Object (FMO), i.e. an object that moves over a distance exceeding its size within the exposure time, is introduced. FMOs may, and typically do, rotate with high angular speed. FMOs are very common in sports videos, but are not rare elsewhere. In a single frame, such objects are often barely visible and appear as semi-transparent streaks. A method for the detection and tracking of FMOs is proposed. The method consists of three distinct algorithms, which form an efficient localization pipeline that operates successfully in a broad range of conditions. We show that it is possible to recover the appearance of the object and its axis of rotation, despite its blurred appearance. The proposed method is evaluated on a new annotated dataset. The results show that existing trackers are inadequate for the problem of FMO localization and a new approach is required. Two applications of localization, temporal super-resolution and highlighting, are presented

arXiv.org e-Print Archive

Crossref

Robust automatic target tracking based on a Bayesian ego-motion compensation framework for airborne FLIR imagery

Author: Blanco Adán Carlos Roberto del
García Santos Narciso
Jaureguizar Núñez Fernando
Salgado Álvarez de Sotomayor Luis
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2009
Field of study

Automatic target tracking in airborne FLIR imagery is currently a challenge due to the camera ego-motion. This phenomenon distorts the spatio-temporal correlation of the video sequence, which dramatically reduces the tracking performance. Several works address this problem using ego-motion compensation strategies. They use a deterministic approach to compensate the camera motion assuming a specific model of geometric transformation. However, in real sequences a specific geometric transformation can not accurately describe the camera ego-motion for the whole sequence, and as consequence of this, the performance of the tracking stage can significantly decrease, even completely fail. The optimum transformation for each pair of consecutive frames depends on the relative depth of the elements that compose the scene, and their degree of texturization. In this work, a novel Particle Filter framework is proposed to efficiently manage several hypothesis of geometric transformations: Euclidean, affine, and projective. Each type of transformation is used to compute candidate locations of the object in the current frame. Then, each candidate is evaluated by the measurement model of the Particle Filter using the appearance information. This approach is able to adapt to different camera ego-motion conditions, and thus to satisfactorily perform the tracking. The proposed strategy has been tested on the AMCOM FLIR dataset, showing a high efficiency in the tracking of different types of targets in real working conditions

Archivo Digital UPM

Event-based Vision: A Survey

Author: Bartolozzi Chiara
Censi Andrea
Conradt Joerg
Daniilidis Kostas
Davison Andrew
Delbruck Tobi
Gallego Guillermo
Leutenegger Stefan
Orchard Garrick
Scaramuzza Davide
Taba Brian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

ZORA