2,757 research outputs found
Depth map compression via 3D region-based representation
In 3D video, view synthesis is used to create new virtual views between
encoded camera views. Errors in the coding of the depth maps introduce
geometry inconsistencies in synthesized views. In this paper, a new 3D plane
representation of the scene is presented which improves the performance of
current standard video codecs in the view synthesis domain. Two image segmentation
algorithms are proposed for generating a color and depth segmentation.
Using both partitions, depth maps are segmented into regions without
sharp discontinuities without having to explicitly signal all depth edges. The
resulting regions are represented using a planar model in the 3D world scene.
This 3D representation allows an efficient encoding while preserving the 3D
characteristics of the scene. The 3D planes open up the possibility to code
multiview images with a unique representation.Postprint (author's final draft
Video processing for panoramic streaming using HEVC and its scalable extensions
Panoramic streaming is a particular way of video streaming where an arbitrary Region-of-Interest (RoI) is transmitted from a high-spatial resolution video, i.e. a video covering a very “wide-angle” (much larger than the human field-of-view – e.g. 360°). Some transport schemes for panoramic video delivery have been proposed and demonstrated within the past decade, which allow users to navigate interactively within the high-resolution videos. With the recent advances of head mounted displays, consumers may soon have immersive and sufficiently convenient end devices at reach, which could lead to an increasing demand for panoramic video experiences. The solution proposed within this paper is built upon tile-based panoramic streaming, where users receive a set of tiles that match their RoI, and consists in a low-complexity compressed domain video processing technique for using H.265/HEVC and its scalable extensions (H.265/SHVC and H.265/MV-HEVC). The proposed technique generates a single video bitstream out of the selected tiles so that a single hardware decoder can be used. It overcomes the scalability issue of previous solutions not using tiles and the battery consumption issue inherent of tile-based panorama streaming, where multiple parallel software decoders are used. In addition, the described technique is capable of reducing peak streaming bitrate during changes of the RoI, which is crucial for allowing a truly immersive and low latency video experience. Besides, it makes it possible to use Open GOP structures without incurring any playback interruption at switching events, which provides a better compression efficiency compared to closed GOP structures
360-Degree Panoramic Video Coding
Virtual reality (VR) creates an immersive experience of real world in virtual environment through computer interface. Due to the technological advancements in recent years, VR technology is growing very fast and as a result industrial usage of this technology is feasible nowadays. This technology is being used in many applications for example gaming, education, streaming live events, etc. Since VR is visualizing the real world experience, the image or video content which is used must represent the whole 3D world characteristics. Omnidirectional images/videos demonstrate such characteristics and hence are used in VR applications. However, these contents are not suitable for conventional video coding standards, which use only 2D image/video format content. Accordingly, the omnidirectional content are projected onto a 2D image plane using cylindrical or pseudo-cylindrical projections.
In this work, coding methods for two types of projection formats that are popular among the VR contents are studied: Equirectangular panoramic projection and Pseudo-cylindrical panoramic projection. The equirectangular projection is the most commonly used format in VR applications due to its rectangular image plane and also wide support in software development environments. However, this projection stretches the nadir and zenith areas of the panorama and as a result contain a relatively large portion of redundant data in these areas. The redundant information causes extra bitrate and also higher encoding/decoding time. Regional downsampling (RDS) methods are used in this work in order to decrease the extra bitrate caused by over-stretched polar areas. These methods are categorized into persistent regional down-sampling (P-RDS) and temporal regional down-sampling (T-RDS) methods. In the P-RDS method, the down-sampling is applied to all frames of the video, but in the T-RDS method, only inter frames are down-sampled and the intra frames are coded in full resolution format in order to maintain the highest possible quality of these frames.
The pseudo-cylindrical projections map the 3D spherical domain to a non-rectangular 2D image plane in which the polar areas do not have redundant information. Therefore, the more realistic sample distribution of 3D world is achieved by using these projection formats. However, because of non-rectangular image plane format, pseudocylindrical panoramas are not favorable for image/video coding standards and as a result the compression performance is not efficient. Therefore, two methods are investigated for improving the intra-frame and inter-frame compression of these panorama formats. In the intra-frame coding method, border edges are smoothed by modifying the content of the image in non-effective picture area. In the interframe coding method, gaining the benefit of 360-degree property of the content, non-effective picture area of reference frames at the border is filled with the content of the effective picture area from the opposite border to improve the performance of motion compensation.
As a final contribution, the quality assessment methods in VR applications are studied. Since the VR content are mainly displayed in head mounted displays (HMDs) which use 3D coordinate system, measuring the quality of decoded image/video with conventional methods does not represent the quality fairly. In this work, spherical quality metrics are investigated for measuring the quality of the proposed coding methods of omnidirectional panoramas. Moreover, a novel spherical quality metric (USS-PSNR) is proposed for evaluating the quality of VR images/video
Video Quality Prediction for Video over Wireless Access Networks (UMTS and WLAN)
Transmission of video content over wireless access networks (in particular, Wireless Local
Area Networks (WLAN) and Third Generation Universal Mobile Telecommunication System (3G UMTS)) is growing exponentially and gaining popularity, and is predicted to expose new revenue streams for mobile network operators. However, the success of these video applications over wireless access networks very much depend on meeting the user’s Quality of Service (QoS) requirements. Thus, it is highly desirable to be able to predict and, if appropriate, to control video quality to meet user’s QoS requirements. Video quality is
affected by distortions caused by the encoder and the wireless access network. The impact of these distortions is content dependent, but this feature has not been widely used in existing
video quality prediction models.
The main aim of the project is the development of novel and efficient models for video
quality prediction in a non-intrusive way for low bitrate and resolution videos and to
demonstrate their application in QoS-driven adaptation schemes for mobile video streaming
applications. This led to five main contributions of the thesis as follows:(1) A thorough understanding of the relationships between video quality, wireless access network (UMTS and WLAN) parameters (e.g. packet/block loss, mean burst length
and link bandwidth), encoder parameters (e.g. sender bitrate, frame rate) and content type is provided. An understanding of the relationships and interactions between them
and their impact on video quality is important as it provides a basis for the development of non-intrusive video quality prediction models.(2) A new content classification method was proposed based on statistical tools as content
type was found to be the most important parameter.
(3) Efficient regression-based and artificial neural network-based learning models were
developed for video quality prediction over WLAN and UMTS access networks. The
models are light weight (can be implemented in real time monitoring), provide a measure for user perceived quality, without time consuming subjective tests. The models have potential applications in several other areas, including QoS control and
optimization in network planning and content provisioning for network/service
providers.(4) The applications of the proposed regression-based models were investigated in (i)
optimization of content provisioning and network resource utilization and (ii) A new
fuzzy sender bitrate adaptation scheme was presented at the sender side over WLAN and UMTS access networks.
(5) Finally, Internet-based subjective tests that captured distortions caused by the encoder
and the wireless access network for different types of contents were designed. The
database of subjective results has been made available to research community as there is a lack of subjective video quality assessment databases.Partially sponsored by EU FP7 ADAMANTIUM Project (EU Contract 214751
Geometry-based spherical JND modeling for 360 display
360 videos have received widespread attention due to its realistic
and immersive experiences for users. To date, how to accurately model the user
perceptions on 360 display is still a challenging issue. In this paper,
we exploit the visual characteristics of 360 projection and display and
extend the popular just noticeable difference (JND) model to spherical JND
(SJND). First, we propose a quantitative 2D-JND model by jointly considering
spatial contrast sensitivity, luminance adaptation and texture masking effect.
In particular, our model introduces an entropy-based region classification and
utilizes different parameters for different types of regions for better
modeling performance. Second, we extend our 2D-JND model to SJND by jointly
exploiting latitude projection and field of view during 360 display.
With this operation, SJND reflects both the characteristics of human vision
system and the 360 display. Third, our SJND model is more consistent
with user perceptions during subjective test and also shows more tolerance in
distortions with fewer bit rates during 360 video compression. To
further examine the effectiveness of our SJND model, we embed it in Versatile
Video Coding (VVC) compression. Compared with the state-of-the-arts, our
SJND-VVC framework significantly reduced the bit rate with negligible loss in
visual quality
Semantic multimedia remote display for mobile thin clients
Current remote display technologies for mobile thin clients convert practically all types of graphical content into sequences of images rendered by the client. Consequently, important information concerning the content semantics is lost. The present paper goes beyond this bottleneck by developing a semantic multimedia remote display. The principle consists of representing the graphical content as a real-time interactive multimedia scene graph. The underlying architecture features novel components for scene-graph creation and management, as well as for user interactivity handling. The experimental setup considers the Linux X windows system and BiFS/LASeR multimedia scene technologies on the server and client sides, respectively. The implemented solution was benchmarked against currently deployed solutions (VNC and Microsoft-RDP), by considering text editing and WWW browsing applications. The quantitative assessments demonstrate: (1) visual quality expressed by seven objective metrics, e.g., PSNR values between 30 and 42 dB or SSIM values larger than 0.9999; (2) downlink bandwidth gain factors ranging from 2 to 60; (3) real-time user event management expressed by network round-trip time reduction by factors of 4-6 and by uplink bandwidth gain factors from 3 to 10; (4) feasible CPU activity, larger than in the RDP case but reduced by a factor of 1.5 with respect to the VNC-HEXTILE
- …