10 research outputs found
Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects
We propose a novel method that tracks fast moving objects, mainly non-uniform
spherical, in full 6 degrees of freedom, estimating simultaneously their 3D
motion trajectory, 3D pose and object appearance changes with a time step that
is a fraction of the video frame exposure time. The sub-frame object
localization and appearance estimation allows realistic temporal
super-resolution and precise shape estimation. The method, called TbD-3D
(Tracking by Deblatting in 3D) relies on a novel reconstruction algorithm which
solves a piece-wise deblurring and matting problem. The 3D rotation is
estimated by minimizing the reprojection error. As a second contribution, we
present a new challenging dataset with fast moving objects that change their
appearance and distance to the camera. High speed camera recordings with zero
lag between frame exposures were used to generate videos with different frame
rates annotated with ground-truth trajectory and pose
Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos
We propose a method for jointly estimating the 3D motion, 3D shape, and
appearance of highly motion-blurred objects from a video. To this end, we model
the blurred appearance of a fast moving object in a generative fashion by
parametrizing its 3D position, rotation, velocity, acceleration, bounces,
shape, and texture over the duration of a predefined time window spanning
multiple frames. Using differentiable rendering, we are able to estimate all
parameters by minimizing the pixel-wise reprojection error to the input video
via backpropagating through a rendering pipeline that accounts for motion blur
by averaging the graphics output over short time intervals. For that purpose,
we also estimate the camera exposure gap time within the same optimization. To
account for abrupt motion changes like bounces, we model the motion trajectory
as a piece-wise polynomial, and we are able to estimate the specific time of
the bounce at sub-frame accuracy. Experiments on established benchmark datasets
demonstrate that our method outperforms previous methods for fast moving object
deblurring and 3D reconstruction.Comment: CVPR 2022 camera-read
Non-Causal Tracking by Deblatting
Tracking by Deblatting stands for solving an inverse problem of deblurring
and image matting for tracking motion-blurred objects. We propose non-causal
Tracking by Deblatting which estimates continuous, complete and accurate object
trajectories. Energy minimization by dynamic programming is used to detect
abrupt changes of motion, called bounces. High-order polynomials are fitted to
segments, which are parts of the trajectory separated by bounces. The output is
a continuous trajectory function which assigns location for every real-valued
time stamp from zero to the number of frames. Additionally, we show that from
the trajectory function precise physical calculations are possible, such as
radius, gravity or sub-frame object velocity. Velocity estimation is compared
to the high-speed camera measurements and radars. Results show high performance
of the proposed method in terms of Trajectory-IoU, recall and velocity
estimation.Comment: Published at GCPR 2019, oral presentation, Best Paper Honorable
Mention Awar
On Deep Image Deblurring: The Blur Factorization Approach
This thesis investigated whether the single image deblurring problem could be factorized into subproblems of camera shake and object motion blur removal for enhanced performance. Two deep learning-based deblurring methods were introduced to answer this question, both following a variation of the proposed blur factorization strategy. Furthermore, a novel pipeline was developed for generating synthetic blurry images, as no existing datasets or data generation methods could meet the requirements of the suggested deblurring models.
The proposed data generation pipeline allows for generating three blurry versions of a single ground truth image, one with both blur types, another with camera shake blur alone, and a third with only object motion blur. The pipeline, based on mathematical models of real-world blur formation, was used to generate a dataset of 2850 triplets of blurry images, which was further divided into a training set of 2500 and a test set of 350 triplets, plus the sharp ground truth images. The datasets were used to train and test both proposed methods.
The proposed methods achieved satisfactory performance. Two variations of the first method, based on strict factorization into subproblems, were tested. The variations differed from each other by which order the blur types were removed. The performance of the pipeline that tried to remove object motion blur first proved superior to that achieved by the pipeline with the reverse processing order. However, both variations were still far inferior compared to the control test, where both blurs were removed simultaneously.
The second method, based on joint training of two sub-models, achieved more promising test results. Two variations out of the four tested outperformed the corresponding control test model, albeit by relatively small margins. The variations differed by the processing order and weighting of the loss functions between the sub-models. Both variations that outperformed the control test model were trained to remove object motion blur first, although the loss function weights were set so that the pipelines’ main focus was on the final sharp images. The performance improvements demonstrate that the proposed blur factorization strategy had a positive impact on deblurring results. Still, even the second method can be deemed only partly successful. This is because a greater performance improvement was gained with an alternative strategy resulting in a model with the same number of parameters as the proposed approach
Recommended from our members
From active to passive spatial acoustic sensing and applications
The active acoustic sensing system emits modulated acoustic waves and analyzes reflection signals. It is dominant in acoustic spatial sensing. On the other side, the passive acoustic sensing system receives and investigates nature sounds directly. It is good at semantic tasks but has weak performance on spatial sensing. In this dissertation, we manage to bridge three gaps in existing systems. They are the gap between the assumption of signal processing algorithms and the real acoustic environment, the gap between powerful active spatial sensing and limited passive spatial sensing, and the gap between the semantic features and spatial information. We evolve the acoustic sensing system design and extend the functionalities by three novel systems.
First, we develop a fully active spatial sensing system DeepRange which can adapt to the real environment easily. We develop an effective mechanism to generate synthetic training data that captures noise, speaker/mic distortion, and interference in the signals. It removes the need of collecting a large volume of data. We then design a deep range neural network (DRNet) to estimate the distance from raw acoustic signals. It is inspired by signal processing that an ultra-long convolution kernel size helps to combat noise and interference. The model is fully trained over synthetic data, but it can achieve sub-centimeter error robustly in real data despite various environments, background noise, interference, and mobile phone models.
Second, we develop a fused active and passive spatial sensing system for speech separation noted as Spatial Aware Multi-task learning-based Separation (SAMS). We leverage both active sensing and passive sensing to improve AoA estimation and jointly optimize the semantic task and the spatial task. SAMS estimates the spatial location and extracts speech for the target user during teleconferencing simultaneously. We first generate fine-grained spatial embeddings from the user’s voice and inaudible tracking sound, which contains the user’s position and rich multipath information. Furthermore, we develop a deep neural network with multi-task learning to jointly optimize source separation and location. We significantly speed up inference to provide a real-time guarantee.
Finally, we deeply fuse the semantic features and spatial cues to combat the interference and noise in the real environment as well as enable depth sensing in a fully passive setup. Inspired by the ”flash-to-bang” phenomenon (i.e.hearing the thunder after seeing the lightning), we propose FBDepth to measure the depth of the sound source. We formulate the problem as an audio-visual event localization task for collision events. Specifically, FBDepth first aligns correspondence between the video track and audio track to locate the target object and target sound in a coarse granularity. Based on the observation of moving objects’ trajectories, it proposes to estimate the intersection of optical flow before and after the collision to locate video events in time. It feeds the estimated timestamp of the video event and the other modalities for the final depth estimation. We use a mobile phone to collect the 3.6K+ video clips involving 24 different objects at up to 60m. FBDepth shows superior performance especially at a long range compared to monocular and stereo methods.Computer Science