1,822 research outputs found
Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention
Stereoscopic image quality assessment (SIQA) plays a crucial role in
evaluating and improving the visual experience of 3D content. Existing
binocular properties and attention-based methods for SIQA have achieved
promising performance. However, these bottom-up approaches are inadequate in
exploiting the inherent characteristics of the human visual system (HVS). This
paper presents a novel network for SIQA via stereo attention, employing a
top-down perspective to guide the quality assessment process. Our proposed
method realizes the guidance from high-level binocular signals down to
low-level monocular signals, while the binocular and monocular information can
be calibrated progressively throughout the processing pipeline. We design a
generalized Stereo AttenTion (SAT) block to implement the top-down philosophy
in stereo perception. This block utilizes the fusion-generated attention map as
a high-level binocular modulator, influencing the representation of two
low-level monocular features. Additionally, we introduce an Energy Coefficient
(EC) to account for recent findings indicating that binocular responses in the
primate primary visual cortex are less than the sum of monocular responses. The
adaptive EC can tune the magnitude of binocular response flexibly, thus
enhancing the formation of robust binocular features within our framework. To
extract the most discriminative quality information from the summation and
subtraction of the two branches of monocular features, we utilize a
dual-pooling strategy that applies min-pooling and max-pooling operations to
the respective branches. Experimental results highlight the superiority of our
top-down method in simulating the property of visual perception and advancing
the state-of-the-art in the SIQA field. The code of this work is available at
https://github.com/Fanning-Zhang/SATNet.Comment: 13 pages, 4 figure
No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness
360-degree/omnidirectional images (OIs) have achieved remarkable attentions
due to the increasing applications of virtual reality (VR). Compared to
conventional 2D images, OIs can provide more immersive experience to consumers,
benefitting from the higher resolution and plentiful field of views (FoVs).
Moreover, observing OIs is usually in the head mounted display (HMD) without
references. Therefore, an efficient blind quality assessment method, which is
specifically designed for 360-degree images, is urgently desired. In this
paper, motivated by the characteristics of the human visual system (HVS) and
the viewing process of VR visual contents, we propose a novel and effective
no-reference omnidirectional image quality assessment (NR OIQA) algorithm by
Multi-Frequency Information and Local-Global Naturalness (MFILGN).
Specifically, inspired by the frequency-dependent property of visual cortex, we
first decompose the projected equirectangular projection (ERP) maps into
wavelet subbands. Then, the entropy intensities of low and high frequency
subbands are exploited to measure the multi-frequency information of OIs.
Besides, except for considering the global naturalness of ERP maps, owing to
the browsed FoVs, we extract the natural scene statistics features from each
viewport image as the measure of local naturalness. With the proposed
multi-frequency information measurement and local-global naturalness
measurement, we utilize support vector regression as the final image quality
regressor to train the quality evaluation model from visual quality-related
features to human ratings. To our knowledge, the proposed model is the first
no-reference quality assessment method for 360-degreee images that combines
multi-frequency information and image naturalness. Experimental results on two
publicly available OIQA databases demonstrate that our proposed MFILGN
outperforms state-of-the-art approaches
Binocular Rivalry Oriented Predictive Auto-Encoding Network for Blind Stereoscopic Image Quality Measurement
Stereoscopic image quality measurement (SIQM) has become increasingly
important for guiding stereo image processing and commutation systems due to
the widespread usage of 3D contents. Compared with conventional methods which
are relied on hand-crafted features, deep learning oriented measurements have
achieved remarkable performance in recent years. However, most existing deep
SIQM evaluators are not specifically built for stereoscopic contents and
consider little prior domain knowledge of the 3D human visual system (HVS) in
network design. In this paper, we develop a Predictive Auto-encoDing Network
(PAD-Net) for blind/No-Reference stereoscopic image quality measurement. In the
first stage, inspired by the predictive coding theory that the cognition system
tries to match bottom-up visual signal with top-down predictions, we adopt the
encoder-decoder architecture to reconstruct the distorted inputs. Besides,
motivated by the binocular rivalry phenomenon, we leverage the likelihood and
prior maps generated from the predictive coding process in the Siamese
framework for assisting SIQM. In the second stage, quality regression network
is applied to the fusion image for acquiring the perceptual quality prediction.
The performance of PAD-Net has been extensively evaluated on three benchmark
databases and the superiority has been well validated on both symmetrically and
asymmetrically distorted stereoscopic images under various distortion types
Recommended from our members
3D multiple description coding for error resilience over wireless networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Mobile communications has gained a growing interest from both customers and service providers alike in the last 1-2 decades. Visual information is used in many application domains such as remote health care, video –on demand, broadcasting, video surveillance etc. In order to enhance the visual effects of digital video content, the depth perception needs to be provided with the actual visual content. 3D video has earned a significant interest from the research community in recent years, due to the tremendous impact it leaves on viewers and its enhancement of the user’s quality of experience (QoE). In the near future, 3D video is likely to be used in most video applications, as it offers a greater sense of immersion and perceptual experience. When 3D video is compressed and transmitted over error prone channels, the associated packet loss leads to visual quality degradation. When a picture is lost or corrupted so severely that the concealment result is not acceptable, the receiver typically pauses video playback and waits for the next INTRA picture to resume decoding. Error propagation caused by employing predictive coding may degrade the video quality severely. There are several ways used to mitigate the effects of such transmission errors. One widely used technique in International Video Coding Standards is error resilience.
The motivation behind this research work is that, existing schemes for 2D colour video compression such as MPEG, JPEG and H.263 cannot be applied to 3D video content. 3D video signals contain depth as well as colour information and are bandwidth demanding, as they require the transmission of multiple high-bandwidth 3D video streams. On the other hand, the capacity of wireless channels is limited and wireless links are prone to various types of errors caused by noise, interference, fading, handoff, error burst and network congestion. Given the maximum bit rate budget to represent the 3D scene, optimal bit-rate allocation between texture and depth information rendering distortion/losses should be minimised. To mitigate the effect of these errors on the perceptual 3D video quality, error resilience video coding needs to be investigated further to offer better quality of experience (QoE) to end users.
This research work aims at enhancing the error resilience capability of compressed 3D video, when transmitted over mobile channels, using Multiple Description Coding (MDC) in order to improve better user’s quality of experience (QoE).
Furthermore, this thesis examines the sensitivity of the human visual system (HVS) when employed to view 3D video scenes. The approach used in this study is to use subjective testing in order to rate people’s perception of 3D video under error free and error prone conditions through the use of a carefully designed bespoke questionnaire.Petroleum Technology Development Fund (PTDF
Deep Multi-Scale Features Learning for Distorted Image Quality Assessment
Image quality assessment (IQA) aims to estimate human perception based image
visual quality. Although existing deep neural networks (DNNs) have shown
significant effectiveness for tackling the IQA problem, it still needs to
improve the DNN-based quality assessment models by exploiting efficient
multi-scale features. In this paper, motivated by the human visual system (HVS)
combining multi-scale features for perception, we propose to use pyramid
features learning to build a DNN with hierarchical multi-scale features for
distorted image quality prediction. Our model is based on both residual maps
and distorted images in luminance domain, where the proposed network contains
spatial pyramid pooling and feature pyramid from the network structure. Our
proposed network is optimized in a deep end-to-end supervision manner. To
validate the effectiveness of the proposed method, extensive experiments are
conducted on four widely-used image quality assessment databases, demonstrating
the superiority of our algorithm
Production and Assessment of Usefulness of Interactive 2-D and Stereoscopic 3-D Videos as Tools for Anatomic Dissection Preparation and Examination Review
Laboratory is an integral part of a gross anatomy course in which students have their first in–depth dissection experience and explore structure-function relationships. Students arrive in the course that requires acquisition of a large vocabulary and visual imagery with scant prior knowledge. Even with extensive preparation on their part, the task is so difficult that students rely heavily on help from peers, teaching assistants, and instructors to gain the best from laboratory time. In recognition of the complexity of the learning task and the limitation on the amount of help available, this research was conducted to explore the value of educational tools that could enhance learning, make time in the laboratory more profitable, and decrease dependency on peers, teaching assistants, and instructors. Because anatomy is a highly visually based discipline, it was reasoned that interactive high definition videos with verbal descriptions of dissections would enhance the learning process. High definition videos of dissections were produced in 2–D and stereoscopic 3–D formats and compared with the standard dissection guide as tools for laboratory preparation. Stereoscopic 3–D format was included because of the hypothesis that the depth it provides might help students more readily grasp the relationships of structures to each other. Timing, duration, and tools provided to interact with the various formats varied with the experiment. The videos consisted of short presentations (10–14 minutes) of dissection steps or reviews of relationships of structures and were self–paced so they could be viewed more than once. Questions to encourage interaction with the materials were integrated into the videos and supplied with the Guide. Depending on the experiment, data collected included performance on paper and practical examinations, dissection quality, and frequency of requests for help in addition to surveys designed to assess ease of use and acceptance of the various presentation modes. Results presented in the thesis indicate that videos were superior to the guide in helping students prepare for dissection and develop understanding of the assigned body structures and their relationships. With the reservation that mode of 3–D delivery may play a role, 2–D videos were usually rated more positively than 3–D videos in student opinions. Both types of videos improved performance on various assessments and received more positive feedback when compared to the laboratory manual. This research confirmed the basic hypothesis that videos are effective tools for use in anatomy education and that they are worthy of significant investment of resources to help overcome some of the challenges facing anatomy educators
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
3D multiple description coding for error resilience over wireless networks
Mobile communications has gained a growing interest from both customers and service providers alike in the last 1-2 decades. Visual information is used in many application domains such as remote health care, video –on demand, broadcasting, video surveillance etc. In order to enhance the visual effects of digital video content, the depth perception needs to be provided with the actual visual content. 3D video has earned a significant interest from the research community in recent years, due to the tremendous impact it leaves on viewers and its enhancement of the user’s quality of experience (QoE). In the near future, 3D video is likely to be used in most video applications, as it offers a greater sense of immersion and perceptual experience. When 3D video is compressed and transmitted over error prone channels, the associated packet loss leads to visual quality degradation. When a picture is lost or corrupted so severely that the concealment result is not acceptable, the receiver typically pauses video playback and waits for the next INTRA picture to resume decoding. Error propagation caused by employing predictive coding may degrade the video quality severely. There are several ways used to mitigate the effects of such transmission errors. One widely used technique in International Video Coding Standards is error resilience. The motivation behind this research work is that, existing schemes for 2D colour video compression such as MPEG, JPEG and H.263 cannot be applied to 3D video content. 3D video signals contain depth as well as colour information and are bandwidth demanding, as they require the transmission of multiple high-bandwidth 3D video streams. On the other hand, the capacity of wireless channels is limited and wireless links are prone to various types of errors caused by noise, interference, fading, handoff, error burst and network congestion. Given the maximum bit rate budget to represent the 3D scene, optimal bit-rate allocation between texture and depth information rendering distortion/losses should be minimised. To mitigate the effect of these errors on the perceptual 3D video quality, error resilience video coding needs to be investigated further to offer better quality of experience (QoE) to end users. This research work aims at enhancing the error resilience capability of compressed 3D video, when transmitted over mobile channels, using Multiple Description Coding (MDC) in order to improve better user’s quality of experience (QoE). Furthermore, this thesis examines the sensitivity of the human visual system (HVS) when employed to view 3D video scenes. The approach used in this study is to use subjective testing in order to rate people’s perception of 3D video under error free and error prone conditions through the use of a carefully designed bespoke questionnaire.EThOS - Electronic Theses Online ServicePetroleum Technology Development Fund (PTDF)GBUnited Kingdo
QoE Enhancement for Stereoscopic 3DVideo Quality Based on Depth and Color Transmission over IP Networks: A Review
In this review paper we focus on the enhancement of Quality of Experience (QoE) for stereoscopic 3D video based on depth information. We focus on stereoscopic video format because it takes less bandwidth than other format when 3D video is transmitted over an error channel but it is easily affected by the network parameters such as packets loss, delay and jitter. The packet loss on 3D video has more impact in the depth information than other 3D video factors such as comfort, motion, disparity and discomfort. The packet loss on depth information causes undesired effect on color and depth maps. Therefore, in order to minimize quality degradation, the application of frame loss concealment technique is preferred. This technique is expected to improve the QoE for end users. In this paper we will also review 3D video factors and their challenges, methods of measuring the QOE, algorithms used for packets loss recovery.
- …