2,962 research outputs found
Multi-stream CNN based Video Semantic Segmentation for Automated Driving
Majority of semantic segmentation algorithms operate on a single frame even
in the case of videos. In this work, the goal is to exploit temporal
information within the algorithm model for leveraging motion cues and temporal
consistency. We propose two simple high-level architectures based on Recurrent
FCN (RFCN) and Multi-Stream FCN (MSFCN) networks. In case of RFCN, a recurrent
network namely LSTM is inserted between the encoder and decoder. MSFCN combines
the encoders of different frames into a fused encoder via 1x1 channel-wise
convolution. We use a ResNet50 network as the baseline encoder and construct
three networks namely MSFCN of order 2 & 3 and RFCN of order 2. MSFCN-3
produces the best results with an accuracy improvement of 9% and 15% for
Highway and New York-like city scenarios in the SYNTHIA-CVPR'16 dataset using
mean IoU metric. MSFCN-3 also produced 11% and 6% for SegTrack V2 and DAVIS
datasets over the baseline FCN network. We also designed an efficient version
of MSFCN-2 and RFCN-2 using weight sharing among the two encoders. The
efficient MSFCN-2 provided an improvement of 11% and 5% for KITTI and SYNTHIA
with negligible increase in computational complexity compared to the baseline
version.Comment: Accepted for Oral Presentation at VISAPP 201
SemARFlow: Injecting Semantics into Unsupervised Optical Flow Estimation for Autonomous Driving
Unsupervised optical flow estimation is especially hard near occlusions and
motion boundaries and in low-texture regions. We show that additional
information such as semantics and domain knowledge can help better constrain
this problem. We introduce SemARFlow, an unsupervised optical flow network
designed for autonomous driving data that takes estimated semantic segmentation
masks as additional inputs. This additional information is injected into the
encoder and into a learned upsampler that refines the flow output. In addition,
a simple yet effective semantic augmentation module provides self-supervision
when learning flow and its boundaries for vehicles, poles, and sky. Together,
these injections of semantic information improve the KITTI-2015 optical flow
test error rate from 11.80% to 8.38%. We also show visible improvements around
object boundaries as well as a greater ability to generalize across datasets.
Code is available at
https://github.com/duke-vision/semantic-unsup-flow-release.Comment: Accepted by ICCV-2023; Code is available at
https://github.com/duke-vision/semantic-unsup-flow-releas
- …