181 research outputs found
Subjective Evaluation of Transmission Errors in IPTV and 3DTV
The increase of multimedia services delivered over packet-based networks has entailed greater quality expectations of the end-users. This has led to an intensive research on techniques for evaluating the quality of experience perceived by the viewers of audiovisual content, considering the different degradations that it could suffer along the broadcasting system. In this paper, a comprehensive study of the impact of transmission errors affecting video and audio in IPTV is presented. With this aim, subjective assessment tests were carried out proposing a novel methodology trying to keep as close as possible home environment viewing conditions. Also 3DTV content in side-by-side format has been used in the experiments to compare the impact of the degradations. The results provide a better understanding of the effects of transmission errors, and show that the QoE related to the first approach of 3DTV is acceptable, but the visual discomfort that it causes should be reduced
Blind Omnidirectional Image Quality Assessment with Viewport Oriented Graph Convolutional Networks
Quality assessment of omnidirectional images has become increasingly urgent
due to the rapid growth of virtual reality applications. Different from
traditional 2D images and videos, omnidirectional contents can provide
consumers with freely changeable viewports and a larger field of view covering
the spherical surface, which makes the objective
quality assessment of omnidirectional images more challenging. In this paper,
motivated by the characteristics of the human vision system (HVS) and the
viewing process of omnidirectional contents, we propose a novel Viewport
oriented Graph Convolution Network (VGCN) for blind omnidirectional image
quality assessment (IQA). Generally, observers tend to give the subjective
rating of a 360-degree image after passing and aggregating different viewports
information when browsing the spherical scenery. Therefore, in order to model
the mutual dependency of viewports in the omnidirectional image, we build a
spatial viewport graph. Specifically, the graph nodes are first defined with
selected viewports with higher probabilities to be seen, which is inspired by
the HVS that human beings are more sensitive to structural information. Then,
these nodes are connected by spatial relations to capture interactions among
them. Finally, reasoning on the proposed graph is performed via graph
convolutional networks. Moreover, we simultaneously obtain global quality using
the entire omnidirectional image without viewport sampling to boost the
performance according to the viewing experience. Experimental results
demonstrate that our proposed model outperforms state-of-the-art full-reference
and no-reference IQA metrics on two public omnidirectional IQA databases
Remote Visual Observation of Real Places Through Virtual Reality Headsets
Virtual Reality has always represented a fascinating yet powerful opportunity that has attracted studies and technology developments, especially since the latest release on the market of powerful high-resolution and wide field-of-view VR headsets. While the great potential of such VR systems is common and accepted knowledge, issues remain related to how to design systems and setups capable of fully exploiting the latest hardware advances.
The aim of the proposed research is to study and understand how to increase the perceived level of realism and sense of presence when remotely observing real places through VR headset displays. Hence, to produce a set of guidelines that give directions to system designers about how to optimize the display-camera setup to enhance performance, focusing on remote visual observation of real places. The outcome of this investigation represents unique knowledge that is believed to be very beneficial for better VR headset designs towards improved remote observation systems.
To achieve the proposed goal, this thesis presents a thorough investigation of existing literature and previous researches, which is carried out systematically to identify the most important factors ruling realism, depth perception, comfort, and sense of presence in VR headset observation. Once identified, these factors are further discussed and assessed through a series of experiments and usability studies, based on a predefined set of research questions.
More specifically, the role of familiarity with the observed place, the role of the environment characteristics shown to the viewer, and the role of the display used for the remote observation of the virtual environment are further investigated. To gain more insights, two usability studies are proposed with the aim of defining guidelines and best practices.
The main outcomes from the two studies demonstrate that test users can experience an enhanced realistic observation when natural features, higher resolution displays, natural illumination, and high image contrast are used in Mobile VR. In terms of comfort, simple scene layouts and relaxing environments are considered ideal to reduce visual fatigue and eye strain. Furthermore, sense of presence increases when observed environments induce strong emotions, and depth perception improves in VR when several monocular cues such as lights and shadows are combined with binocular depth cues.
Based on these results, this investigation then presents a focused evaluation on the outcomes and introduces an innovative eye-adapted High Dynamic Range (HDR) approach, which the author believes to be of great improvement in the context of remote observation when combined with eye-tracked VR headsets. Within this purpose, a third user study is proposed to compare static HDR and eye-adapted HDR observation in VR, to assess that the latter can improve realism, depth perception, sense of presence, and in certain cases even comfort. Results from this last study confirmed the author expectations, proving that eye-adapted HDR and eye tracking should be used to achieve best visual performances for remote observation in modern VR systems
Ultrasound-Augmented Laparoscopy
Laparoscopic surgery is perhaps the most common minimally invasive procedure for many diseases in the abdomen. Since the laparoscopic camera provides only the surface view of the internal organs, in many procedures, surgeons use laparoscopic ultrasound (LUS) to visualize deep-seated surgical targets. Conventionally, the 2D LUS image is visualized in a display spatially separate from that displays the laparoscopic video. Therefore, reasoning about the geometry of hidden targets requires mentally solving the spatial alignment, and resolving the modality differences, which is cognitively very challenging. Moreover, the mental representation of hidden targets in space acquired through such cognitive medication may be error prone, and cause incorrect actions to be performed.
To remedy this, advanced visualization strategies are required where the US information is visualized in the context of the laparoscopic video. To this end, efficient computational methods are required to accurately align the US image coordinate system with that centred in the camera, and to render the registered image information in the context of the camera such that surgeons perceive the geometry of hidden targets accurately. In this thesis, such a visualization pipeline is described. A novel method to register US images with a camera centric coordinate system is detailed with an experimental investigation into its accuracy bounds. An improved method to blend US information with the surface view is also presented with an experimental investigation into the accuracy of perception of the target locations in space
Recommended from our members
End-to-end 3D video communication over heterogeneous networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Three-dimensional technology, more commonly referred to as 3D technology, has revolutionised many fields including entertainment, medicine, and communications to name a few. In addition to 3D films, games, and sports channels, 3D perception has made tele-medicine a reality. By the year 2015, 30% of the all HD panels at home will be 3D enabled, predicted by consumer electronics manufacturers. Stereoscopic cameras, a comparatively mature technology compared to other 3D systems, are now being used by ordinary citizens to produce 3D content and share at a click of a button just like they do with the 2D counterparts via sites like YouTube. But technical challenges still exist, including with autostereoscopic multiview displays. 3D content requires many complex considerations--including how to represent it, and deciphering what is the best compression format--when considering transmission or storage, because of its increased amount of data. Any decision must be taken in the light of the available bandwidth or storage capacity, quality and user expectations. Free viewpoint navigation also remains partly unsolved. The most pressing issue getting in the way of widespread uptake of consumer 3D systems is the ability to deliver 3D content to heterogeneous consumer displays over the heterogeneous networks. Optimising 3D video communication solutions must consider the entire pipeline, starting with optimisation at the video source to the end display and transmission optimisation. Multi-view offers the most compelling solution for 3D videos with motion parallax and freedom from wearing headgear for 3D video perception. Optimising multi-view video for delivery and display could increase the demand for true 3D in the consumer market. This thesis focuses on an end-to-end quality optimisation in 3D video communication/transmission, offering solutions for optimisation at the compression, transmission, and decoder levels.Brunel University - Isambard Research Scholarshi
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality
Interactive Free-Viewpoint Video Generation
Background
Free-viewpoint video (FVV) is processed video content in which viewers can freely select the viewing position and angle. FVV delivers an improved visual experience and can also help synthesize special effects and virtual reality content. In this paper, a complete FVV system is proposed to interactively control the viewpoints of video relay programs through multimedia terminals such as computers and tablets.
Methods The hardware of the FVV generation system is a set of synchronously controlled cameras, and the software generates videos in novel viewpoints from the captured video using view interpolation. The interactive interface is designed to visualize the generated video in novel viewpoints and enable the viewpoint to be changed interactively.
Results
Experiments show that our system can synthesize plausible videos in intermediate viewpoints with a view range of up to 180°
- âŠ