69,084 research outputs found
Adaptive Temporal Encoding Network for Video Instance-level Human Parsing
Beyond the existing single-person and multiple-person human parsing tasks in
static images, this paper makes the first attempt to investigate a more
realistic video instance-level human parsing that simultaneously segments out
each person instance and parses each instance into more fine-grained parts
(e.g., head, leg, dress). We introduce a novel Adaptive Temporal Encoding
Network (ATEN) that alternatively performs temporal encoding among key frames
and flow-guided feature propagation from other consecutive frames between two
key frames. Specifically, ATEN first incorporates a Parsing-RCNN to produce the
instance-level parsing result for each key frame, which integrates both the
global human parsing and instance-level human segmentation into a unified
model. To balance between accuracy and efficiency, the flow-guided feature
propagation is used to directly parse consecutive frames according to their
identified temporal consistency with key frames. On the other hand, ATEN
leverages the convolution gated recurrent units (convGRU) to exploit temporal
changes over a series of key frames, which are further used to facilitate the
frame-level instance-level parsing. By alternatively performing direct feature
propagation between consistent frames and temporal encoding network among key
frames, our ATEN achieves a good balance between frame-level accuracy and time
efficiency, which is a common crucial problem in video object segmentation
research. To demonstrate the superiority of our ATEN, extensive experiments are
conducted on the most popular video segmentation benchmark (DAVIS) and a newly
collected Video Instance-level Parsing (VIP) dataset, which is the first video
instance-level human parsing dataset comprised of 404 sequences and over 20k
frames with instance-level and pixel-wise annotations.Comment: To appear in ACM MM 2018. Code link:
https://github.com/HCPLab-SYSU/ATEN. Dataset link: http://sysu-hcp.net/li
Predicting Future Instance Segmentation by Forecasting Convolutional Features
Anticipating future events is an important prerequisite towards intelligent
behavior. Video forecasting has been studied as a proxy task towards this goal.
Recent work has shown that to predict semantic segmentation of future frames,
forecasting at the semantic level is more effective than forecasting RGB frames
and then segmenting these. In this paper we consider the more challenging
problem of future instance segmentation, which additionally segments out
individual objects. To deal with a varying number of output labels per image,
we develop a predictive model in the space of fixed-sized convolutional
features of the Mask R-CNN instance segmentation model. We apply the "detection
head'" of Mask R-CNN on the predicted features to produce the instance
segmentation of future frames. Experiments show that this approach
significantly improves over strong baselines based on optical flow and
repurposed instance segmentation architectures
Deep Learning Techniques for Video Instance Segmentation: A Survey
Video instance segmentation, also known as multi-object tracking and
segmentation, is an emerging computer vision research area introduced in 2019,
aiming at detecting, segmenting, and tracking instances in videos
simultaneously. By tackling the video instance segmentation tasks through
effective analysis and utilization of visual information in videos, a range of
computer vision-enabled applications (e.g., human action recognition, medical
image processing, autonomous vehicle navigation, surveillance, etc) can be
implemented. As deep-learning techniques take a dominant role in various
computer vision areas, a plethora of deep-learning-based video instance
segmentation schemes have been proposed. This survey offers a multifaceted view
of deep-learning schemes for video instance segmentation, covering various
architectural paradigms, along with comparisons of functional performance,
model complexity, and computational overheads. In addition to the common
architectural designs, auxiliary techniques for improving the performance of
deep-learning models for video instance segmentation are compiled and
discussed. Finally, we discuss a range of major challenges and directions for
further investigations to help advance this promising research field
- …