1,939 research outputs found
GyroFlow+: Gyroscope-Guided Unsupervised Deep Homography and Optical Flow Learning
Existing homography and optical flow methods are erroneous in challenging
scenes, such as fog, rain, night, and snow because the basic assumptions such
as brightness and gradient constancy are broken. To address this issue, we
present an unsupervised learning approach that fuses gyroscope into homography
and optical flow learning. Specifically, we first convert gyroscope readings
into motion fields named gyro field. Second, we design a self-guided fusion
module (SGF) to fuse the background motion extracted from the gyro field with
the optical flow and guide the network to focus on motion details. Meanwhile,
we propose a homography decoder module (HD) to combine gyro field and
intermediate results of SGF to produce the homography. To the best of our
knowledge, this is the first deep learning framework that fuses gyroscope data
and image content for both deep homography and optical flow learning. To
validate our method, we propose a new dataset that covers regular and
challenging scenes. Experiments show that our method outperforms the
state-of-the-art methods in both regular and challenging scenes.Comment: 12 pages. arXiv admin note: substantial text overlap with
arXiv:2103.1372
GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning
Existing optical flow methods are erroneous in challenging scenes, such as
fog, rain, and night because the basic optical flow assumptions such as
brightness and gradient constancy are broken. To address this problem, we
present an unsupervised learning approach that fuses gyroscope into optical
flow learning. Specifically, we first convert gyroscope readings into motion
fields named gyro field. Then, we design a self-guided fusion module to fuse
the background motion extracted from the gyro field with the optical flow and
guide the network to focus on motion details. To the best of our knowledge,
this is the first deep learning-based framework that fuses gyroscope data and
image content for optical flow learning. To validate our method, we propose a
new dataset that covers regular and challenging scenes. Experiments show that
our method outperforms the state-of-art methods in both regular and challenging
scenes
Unsupervised Hierarchical Domain Adaptation for Adverse Weather Optical Flow
Optical flow estimation has made great progress, but usually suffers from
degradation under adverse weather. Although semi/full-supervised methods have
made good attempts, the domain shift between the synthetic and real adverse
weather images would deteriorate their performance. To alleviate this issue,
our start point is to unsupervisedly transfer the knowledge from source clean
domain to target degraded domain. Our key insight is that adverse weather does
not change the intrinsic optical flow of the scene, but causes a significant
difference for the warp error between clean and degraded images. In this work,
we propose the first unsupervised framework for adverse weather optical flow
via hierarchical motion-boundary adaptation. Specifically, we first employ
image translation to construct the transformation relationship between clean
and degraded domains. In motion adaptation, we utilize the flow consistency
knowledge to align the cross-domain optical flows into a motion-invariance
common space, where the optical flow from clean weather is used as the
guidance-knowledge to obtain a preliminary optical flow for adverse weather.
Furthermore, we leverage the warp error inconsistency which measures the motion
misalignment of the boundary between the clean and degraded domains, and
propose a joint intra- and inter-scene boundary contrastive adaptation to
refine the motion boundary. The hierarchical motion and boundary adaptation
jointly promotes optical flow in a unified framework. Extensive quantitative
and qualitative experiments have been performed to verify the superiority of
the proposed method
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
ASF-Net: Robust Video Deraining via Temporal Alignment and Online Adaptive Learning
In recent times, learning-based methods for video deraining have demonstrated
commendable results. However, there are two critical challenges that these
methods are yet to address: exploiting temporal correlations among adjacent
frames and ensuring adaptability to unknown real-world scenarios. To overcome
these challenges, we explore video deraining from a paradigm design perspective
to learning strategy construction. Specifically, we propose a new computational
paradigm, Alignment-Shift-Fusion Network (ASF-Net), which incorporates a
temporal shift module. This module is novel to this field and provides deeper
exploration of temporal information by facilitating the exchange of
channel-level information within the feature space. To fully discharge the
model's characterization capability, we further construct a LArge-scale RAiny
video dataset (LARA) which also supports the development of this community. On
the basis of the newly-constructed dataset, we explore the parameters learning
process by developing an innovative re-degraded learning strategy. This
strategy bridges the gap between synthetic and real-world scenes, resulting in
stronger scene adaptability. Our proposed approach exhibits superior
performance in three benchmarks and compelling visual quality in real-world
scenarios, underscoring its efficacy. The code is available at
https://github.com/vis-opt-group/ASF-Net
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
- ā¦