20,986 research outputs found
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
A deep learning framework for quality assessment and restoration in video endoscopy
Endoscopy is a routine imaging technique used for both diagnosis and
minimally invasive surgical treatment. Artifacts such as motion blur, bubbles,
specular reflections, floating objects and pixel saturation impede the visual
interpretation and the automated analysis of endoscopy videos. Given the
widespread use of endoscopy in different clinical applications, we contend that
the robust and reliable identification of such artifacts and the automated
restoration of corrupted video frames is a fundamental medical imaging problem.
Existing state-of-the-art methods only deal with the detection and restoration
of selected artifacts. However, typically endoscopy videos contain numerous
artifacts which motivates to establish a comprehensive solution.
We propose a fully automatic framework that can: 1) detect and classify six
different primary artifacts, 2) provide a quality score for each frame and 3)
restore mildly corrupted frames. To detect different artifacts our framework
exploits fast multi-scale, single stage convolutional neural network detector.
We introduce a quality metric to assess frame quality and predict image
restoration success. Generative adversarial networks with carefully chosen
regularization are finally used to restore corrupted frames.
Our detector yields the highest mean average precision (mAP at 5% threshold)
of 49.0 and the lowest computational time of 88 ms allowing for accurate
real-time processing. Our restoration models for blind deblurring, saturation
correction and inpainting demonstrate significant improvements over previous
methods. On a set of 10 test videos we show that our approach preserves an
average of 68.7% which is 25% more frames than that retained from the raw
videos.Comment: 14 page
- …