12,918 research outputs found
Dynamic Face Video Segmentation via Reinforcement Learning
For real-time semantic video segmentation, most recent works utilised a
dynamic framework with a key scheduler to make online key/non-key decisions.
Some works used a fixed key scheduling policy, while others proposed adaptive
key scheduling methods based on heuristic strategies, both of which may lead to
suboptimal global performance. To overcome this limitation, we model the online
key decision process in dynamic video segmentation as a deep reinforcement
learning problem and learn an efficient and effective scheduling policy from
expert information about decision history and from the process of maximising
global return. Moreover, we study the application of dynamic video segmentation
on face videos, a field that has not been investigated before. By evaluating on
the 300VW dataset, we show that the performance of our reinforcement key
scheduler outperforms that of various baselines in terms of both effective key
selections and running speed. Further results on the Cityscapes dataset
demonstrate that our proposed method can also generalise to other scenarios. To
the best of our knowledge, this is the first work to use reinforcement learning
for online key-frame decision in dynamic video segmentation, and also the first
work on its application on face videos.Comment: CVPR 2020. 300VW with segmentation labels is available at:
https://github.com/mapleandfire/300VW-Mas
Learning the dynamics and time-recursive boundary detection of deformable objects
We propose a principled framework for recursively segmenting deformable objects across a sequence
of frames. We demonstrate the usefulness of this method on left ventricular segmentation across a cardiac
cycle. The approach involves a technique for learning the system dynamics together with methods of
particle-based smoothing as well as non-parametric belief propagation on a loopy graphical model capturing
the temporal periodicity of the heart. The dynamic system state is a low-dimensional representation
of the boundary, and the boundary estimation involves incorporating curve evolution into recursive state
estimation. By formulating the problem as one of state estimation, the segmentation at each particular
time is based not only on the data observed at that instant, but also on predictions based on past and future
boundary estimates. Although the paper focuses on left ventricle segmentation, the method generalizes
to temporally segmenting any deformable object
Automating Carotid Intima-Media Thickness Video Interpretation with Convolutional Neural Networks
Cardiovascular disease (CVD) is the leading cause of mortality yet largely
preventable, but the key to prevention is to identify at-risk individuals
before adverse events. For predicting individual CVD risk, carotid intima-media
thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable,
offering several advantages over CT coronary artery calcium score. However,
each CIMT examination includes several ultrasound videos, and interpreting each
of these CIMT videos involves three operations: (1) select three end-diastolic
ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI)
in each selected frame, and (3) trace the lumen-intima interface and the
media-adventitia interface in each ROI to measure CIMT. These operations are
tedious, laborious, and time consuming, a serious limitation that hinders the
widespread utilization of CIMT in clinical practice. To overcome this
limitation, this paper presents a new system to automate CIMT video
interpretation. Our extensive experiments demonstrate that the suggested system
significantly outperforms the state-of-the-art methods. The superior
performance is attributable to our unified framework based on convolutional
neural networks (CNNs) coupled with our informative image representation and
effective post-processing of the CNN outputs, which are uniquely designed for
each of the above three operations.Comment: J. Y. Shin, N. Tajbakhsh, R. T. Hurst, C. B. Kendall, and J. Liang.
Automating carotid intima-media thickness video interpretation with
convolutional neural networks. CVPR 2016, pp 2526-2535; N. Tajbakhsh, J. Y.
Shin, R. T. Hurst, C. B. Kendall, and J. Liang. Automatic interpretation of
CIMT videos using convolutional neural networks. Deep Learning for Medical
Image Analysis, Academic Press, 201
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
- …