Search CORE

10 research outputs found

Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

Author: Kotera Jan
Matas Jiri
Rozumnyi Denys
Sroubek Filip
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/11/2019
Field of study

We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time. The sub-frame object localization and appearance estimation allows realistic temporal super-resolution and precise shape estimation. The method, called TbD-3D (Tracking by Deblatting in 3D) relies on a novel reconstruction algorithm which solves a piece-wise deblurring and matting problem. The 3D rotation is estimated by minimizing the reprojection error. As a second contribution, we present a new challenging dataset with fast moving objects that change their appearance and distance to the camera. High speed camera recordings with zero lag between frame exposures were used to generate videos with different frame rates annotated with ground-truth trajectory and pose

arXiv.org e-Print Archive

Crossref

Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos

Author: Ferrari V.
Oswald M.R.
Pollefeys M.
Rozumnyi D.
Publication venue: IEEE Computer Society
Publication date: 01/01/2022
Field of study

We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using differentiable rendering, we are able to estimate all parameters by minimizing the pixel-wise reprojection error to the input video via backpropagating through a rendering pipeline that accounts for motion blur by averaging the graphics output over short time intervals. For that purpose, we also estimate the camera exposure gap time within the same optimization. To account for abrupt motion changes like bounces, we model the motion trajectory as a piece-wise polynomial, and we are able to estimate the specific time of the bounce at sub-frame accuracy. Experiments on established benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.Comment: CVPR 2022 camera-read

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Non-Causal Tracking by Deblatting

Author: A Lukežič
B Ma
C Seibold
M Kristan
M Kristan
M Mueller
T Kroeger
T Vojir
Z Kalal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/09/2019
Field of study

Tracking by Deblatting stands for solving an inverse problem of deblurring and image matting for tracking motion-blurred objects. We propose non-causal Tracking by Deblatting which estimates continuous, complete and accurate object trajectories. Energy minimization by dynamic programming is used to detect abrupt changes of motion, called bounces. High-order polynomials are fitted to segments, which are parts of the trajectory separated by bounces. The output is a continuous trajectory function which assigns location for every real-valued time stamp from zero to the number of frames. Additionally, we show that from the trajectory function precise physical calculations are possible, such as radius, gravity or sub-frame object velocity. Velocity estimation is compared to the high-speed camera measurements and radars. Results show high performance of the proposed method in terms of Trajectory-IoU, recall and velocity estimation.Comment: Published at GCPR 2019, oral presentation, Best Paper Honorable Mention Awar

arXiv.org e-Print Archive

Crossref

Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

Author: Ferrari V.
Oswald M.R.
Pollefeys M.
Rozumnyi D.
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

On Deep Image Deblurring: The Blur Factorization Approach

Author: Hynninen Samuli
Publication venue
Publication date: 17/08/2023
Field of study

This thesis investigated whether the single image deblurring problem could be factorized into subproblems of camera shake and object motion blur removal for enhanced performance. Two deep learning-based deblurring methods were introduced to answer this question, both following a variation of the proposed blur factorization strategy. Furthermore, a novel pipeline was developed for generating synthetic blurry images, as no existing datasets or data generation methods could meet the requirements of the suggested deblurring models. The proposed data generation pipeline allows for generating three blurry versions of a single ground truth image, one with both blur types, another with camera shake blur alone, and a third with only object motion blur. The pipeline, based on mathematical models of real-world blur formation, was used to generate a dataset of 2850 triplets of blurry images, which was further divided into a training set of 2500 and a test set of 350 triplets, plus the sharp ground truth images. The datasets were used to train and test both proposed methods. The proposed methods achieved satisfactory performance. Two variations of the first method, based on strict factorization into subproblems, were tested. The variations differed from each other by which order the blur types were removed. The performance of the pipeline that tried to remove object motion blur first proved superior to that achieved by the pipeline with the reverse processing order. However, both variations were still far inferior compared to the control test, where both blurs were removed simultaneously. The second method, based on joint training of two sub-models, achieved more promising test results. Two variations out of the four tested outperformed the corresponding control test model, albeit by relatively small margins. The variations differed by the processing order and weighting of the loss functions between the sub-models. Both variations that outperformed the control test model were trained to remove object motion blur first, although the loss function weights were set so that the pipelines’ main focus was on the final sharp images. The performance improvements demonstrate that the proposed blur factorization strategy had a positive impact on deblurring results. Still, even the second method can be deemed only partly successful. This is because a greater performance improvement was gained with an alternative strategy resulting in a model with the same number of parameters as the proposed approach

Trepo - Institutional Repository of Tampere University

Recommended from our members

From active to passive spatial acoustic sensing and applications

Author: Sun Wei (Ph. D. in computer science)
Publication venue
Publication date: 31/03/2023
Field of study

The active acoustic sensing system emits modulated acoustic waves and analyzes reflection signals. It is dominant in acoustic spatial sensing. On the other side, the passive acoustic sensing system receives and investigates nature sounds directly. It is good at semantic tasks but has weak performance on spatial sensing. In this dissertation, we manage to bridge three gaps in existing systems. They are the gap between the assumption of signal processing algorithms and the real acoustic environment, the gap between powerful active spatial sensing and limited passive spatial sensing, and the gap between the semantic features and spatial information. We evolve the acoustic sensing system design and extend the functionalities by three novel systems. First, we develop a fully active spatial sensing system DeepRange which can adapt to the real environment easily. We develop an effective mechanism to generate synthetic training data that captures noise, speaker/mic distortion, and interference in the signals. It removes the need of collecting a large volume of data. We then design a deep range neural network (DRNet) to estimate the distance from raw acoustic signals. It is inspired by signal processing that an ultra-long convolution kernel size helps to combat noise and interference. The model is fully trained over synthetic data, but it can achieve sub-centimeter error robustly in real data despite various environments, background noise, interference, and mobile phone models. Second, we develop a fused active and passive spatial sensing system for speech separation noted as Spatial Aware Multi-task learning-based Separation (SAMS). We leverage both active sensing and passive sensing to improve AoA estimation and jointly optimize the semantic task and the spatial task. SAMS estimates the spatial location and extracts speech for the target user during teleconferencing simultaneously. We first generate fine-grained spatial embeddings from the user’s voice and inaudible tracking sound, which contains the user’s position and rich multipath information. Furthermore, we develop a deep neural network with multi-task learning to jointly optimize source separation and location. We significantly speed up inference to provide a real-time guarantee. Finally, we deeply fuse the semantic features and spatial cues to combat the interference and noise in the real environment as well as enable depth sensing in a fully passive setup. Inspired by the ”flash-to-bang” phenomenon (i.e.hearing the thunder after seeing the lightning), we propose FBDepth to measure the depth of the sound source. We formulate the problem as an audio-visual event localization task for collision events. Specifically, FBDepth first aligns correspondence between the video track and audio track to locate the target object and target sound in a coarse granularity. Based on the observation of moving objects’ trajectories, it proposes to estimate the intersection of optical flow before and after the collision to locate video events in time. It feeds the estimated timestamp of the video event and the other modalities for the final depth estimation. We use a mobile phone to collect the 3.6K+ video clips involving 24 different objects at up to 60m. FBDepth shows superior performance especially at a long range compared to monocular and stereo methods.Computer Science

Texas ScholarWorks