9,673 research outputs found

    A deep learning framework for quality assessment and restoration in video endoscopy

    Full text link
    Endoscopy is a routine imaging technique used for both diagnosis and minimally invasive surgical treatment. Artifacts such as motion blur, bubbles, specular reflections, floating objects and pixel saturation impede the visual interpretation and the automated analysis of endoscopy videos. Given the widespread use of endoscopy in different clinical applications, we contend that the robust and reliable identification of such artifacts and the automated restoration of corrupted video frames is a fundamental medical imaging problem. Existing state-of-the-art methods only deal with the detection and restoration of selected artifacts. However, typically endoscopy videos contain numerous artifacts which motivates to establish a comprehensive solution. We propose a fully automatic framework that can: 1) detect and classify six different primary artifacts, 2) provide a quality score for each frame and 3) restore mildly corrupted frames. To detect different artifacts our framework exploits fast multi-scale, single stage convolutional neural network detector. We introduce a quality metric to assess frame quality and predict image restoration success. Generative adversarial networks with carefully chosen regularization are finally used to restore corrupted frames. Our detector yields the highest mean average precision (mAP at 5% threshold) of 49.0 and the lowest computational time of 88 ms allowing for accurate real-time processing. Our restoration models for blind deblurring, saturation correction and inpainting demonstrate significant improvements over previous methods. On a set of 10 test videos we show that our approach preserves an average of 68.7% which is 25% more frames than that retained from the raw videos.Comment: 14 page

    Learning to Predict Image-based Rendering Artifacts with Respect to a Hidden Reference Image

    Full text link
    Image metrics predict the perceived per-pixel difference between a reference image and its degraded (e. g., re-rendered) version. In several important applications, the reference image is not available and image metrics cannot be applied. We devise a neural network architecture and training procedure that allows predicting the MSE, SSIM or VGG16 image difference from the distorted image alone while the reference is not observed. This is enabled by two insights: The first is to inject sufficiently many un-distorted natural image patches, which can be found in arbitrary amounts and are known to have no perceivable difference to themselves. This avoids false positives. The second is to balance the learning, where it is carefully made sure that all image errors are equally likely, avoiding false negatives. Surprisingly, we observe, that the resulting no-reference metric, subjectively, can even perform better than the reference-based one, as it had to become robust against mis-alignments. We evaluate the effectiveness of our approach in an image-based rendering context, both quantitatively and qualitatively. Finally, we demonstrate two applications which reduce light field capture time and provide guidance for interactive depth adjustment.Comment: 13 pages, 11 figure

    Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

    Full text link
    We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features

    Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model

    Full text link
    Omnidirectional video enables spherical stimuli with the 360×180∘360 \times 180^ \circ viewing range. Meanwhile, only the viewport region of omnidirectional video can be seen by the observer through head movement (HM), and an even smaller region within the viewport can be clearly perceived through eye movement (EM). Thus, the subjective quality of omnidirectional video may be correlated with HM and EM of human behavior. To fill in the gap between subjective quality and human behavior, this paper proposes a large-scale visual quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset provides not only the subjective quality scores of sequences but also the HM and EM data of subjects. By mining our dataset, we find that the subjective quality of omnidirectional video is indeed related to HM and EM. Hence, we develop a deep learning model, which embeds HM and EM, for objective VQA on omnidirectional video. Experimental results show that our model significantly improves the state-of-the-art performance of VQA on omnidirectional video.Comment: Accepted by ACM MM 201
    • …
    corecore