301 research outputs found
Do Users Behave Similarly in VR? Investigation of the User Influence on the System Design
With the overarching goal of developing user-centric Virtual Reality (VR) systems, a new wave of studies focused on understanding how users interact in VR environments has recently emerged. Despite the intense efforts, however, current literature still does not provide the right framework to fully interpret and predict users’ trajectories while navigating in VR scenes. This work advances the state-of-the-art on both the study of users’ behaviour in VR and the user-centric system design. In more detail, we complement current datasets by presenting a publicly available dataset that provides navigation trajectories acquired for heterogeneous omnidirectional videos and different viewing platforms—namely, head-mounted display, tablet, and laptop. We then present an exhaustive analysis on the collected data to better understand navigation in VR across users, content, and, for the first time, across viewing platforms. The novelty lies in the user-affinity metric, proposed in this work to investigate users’ similarities when navigating within the content. The analysis reveals useful insights on the effect of device and content on the navigation, which could be precious considerations from the system design perspective. As a case study of the importance of studying users’ behaviour when designing VR systems, we finally propose a user-centric server optimisation. We formulate an integer linear program that seeks the best stored set of omnidirectional content that minimises encoding and storage cost while maximising the user’s experience. This is posed while taking into account network dynamics, type of video content, and also user population interactivity. Experimental results prove that our solution outperforms common company recommendations in terms of experienced quality but also in terms of encoding and storage, achieving a savings up to 70%. More importantly, we highlight a strong correlation between the storage cost and the user-affinity metric, showing the impact of the latter in the system architecture design
Streaming and User Behaviour in Omnidirectional Videos
Omnidirectional videos (ODVs) have gone beyond the passive paradigm of traditional video,
offering higher degrees of immersion and interaction. The revolutionary novelty of this technology is the possibility for users to interact with the surrounding environment, and to feel a
sense of engagement and presence in a virtual space. Users are clearly the main driving force of
immersive applications and consequentially the services need to be properly tailored to them.
In this context, this chapter highlights the importance of the new role of users in ODV streaming applications, and thus the need for understanding their behaviour while navigating within
ODVs. A comprehensive overview of the research efforts aimed at advancing ODV streaming
systems is also presented. In particular, the state-of-the-art solutions under examination in this
chapter are distinguished in terms of system-centric and user-centric streaming approaches: the
former approach comes from a quite straightforward extension of well-established solutions for
the 2D video pipeline while the latter one takes the benefit of understanding users’ behaviour
and enable more personalised ODV streaming
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
LiveVV: Human-Centered Live Volumetric Video Streaming System
Volumetric video has emerged as a prominent medium within the realm of
eXtended Reality (XR) with the advancements in computer graphics and depth
capture hardware. Users can fully immersive themselves in volumetric video with
the ability to switch their viewport in six degree-of-freedom (DOF), including
three rotational dimensions (yaw, pitch, roll) and three translational
dimensions (X, Y, Z). Different from traditional 2D videos that are composed of
pixel matrices, volumetric videos employ point clouds, meshes, or voxels to
represent a volumetric scene, resulting in significantly larger data sizes.
While previous works have successfully achieved volumetric video streaming in
video-on-demand scenarios, the live streaming of volumetric video remains an
unresolved challenge due to the limited network bandwidth and stringent latency
constraints. In this paper, we for the first time propose a holistic live
volumetric video streaming system, LiveVV, which achieves multi-view capture,
scene segmentation \& reuse, adaptive transmission, and rendering. LiveVV
contains multiple lightweight volumetric video capture modules that are capable
of being deployed without prior preparation. To reduce bandwidth consumption,
LiveVV processes static and dynamic volumetric content separately by reusing
static data with low disparity and decimating data with low visual saliency.
Besides, to deal with network fluctuation, LiveVV integrates a volumetric video
adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the
maximum quality of experience. Extensive real-world experiment shows that
LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps
with a latency of less than 350ms
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Understanding user interactivity for the next-generation immersive communication: design, optimisation, and behavioural analysis
Recent technological advances have opened the gate to a novel way to communicate remotely still feeling connected. In these immersive communications, humans are at the centre of virtual or augmented reality with a full sense of immersion and the possibility to interact with the new environment as well as other humans virtually present. These next-generation communication systems hide a huge potential that can invest in major economic sectors. However, they also posed many new technical challenges, mainly due to the new role of the final user: from merely passive to fully active in requesting and interacting with the content. Thus, we need to go beyond the traditional quality of experience research and develop user-centric solutions, in which the whole multimedia experience is tailored to the final interactive user. With this goal in mind, a better understanding of how people interact with immersive content is needed and it is the focus of this thesis.
In this thesis, we study the behaviour of interactive users in immersive experiences and its impact on the next-generation multimedia systems. The thesis covers a deep literature review on immersive services and user centric solutions, before develop- ing three main research strands. First, we implement novel tools for behavioural analysis of users navigating in a 3-DoF Virtual Reality (VR) system. In detail, we study behavioural similarities among users by proposing a novel clustering algorithm. We also introduce information-theoretic metrics for quantifying similarities for the same viewer across contents. As second direction, we show the impact and advantages of taking into account user behaviour in immersive systems. Specifically, we formulate optimal user centric solutions i) from a server-side perspective and ii) a navigation aware adaptation logic for VR streaming platforms. We conclude by exploiting the aforementioned behavioural studies towards a more in- interactive immersive technology: a 6-DoF VR. Overall in this thesis, experimental results based on real navigation trajectories show key advantages of understanding any hidden patterns of user interactivity to be eventually exploited in engineering user centric solutions for immersive systems
Understanding user experience of mobile video: Framework, measurement, and optimization
Since users have become the focus of product/service design in last decade, the term User eXperience (UX) has been frequently used in the field of Human-Computer-Interaction (HCI). Research on UX facilitates a better understanding of the various aspects of the user’s interaction with the product or service. Mobile video, as a new and promising service and research field, has attracted great attention. Due to the significance of UX in the success of mobile video (Jordan, 2002), many researchers have centered on this area, examining users’ expectations, motivations, requirements, and usage context. As a result, many influencing factors have been explored (Buchinger, Kriglstein, Brandt & Hlavacs, 2011; Buchinger, Kriglstein & Hlavacs, 2009). However, a general framework for specific mobile video service is lacking for structuring such a great number of factors. To measure user experience of multimedia services such as mobile video, quality of experience (QoE) has recently become a prominent concept. In contrast to the traditionally used concept quality of service (QoS), QoE not only involves objectively measuring the delivered service but also takes into account user’s needs and desires when using the service, emphasizing the user’s overall acceptability on the service. Many QoE metrics are able to estimate the user perceived quality or acceptability of mobile video, but may be not enough accurate for the overall UX prediction due to the complexity of UX. Only a few frameworks of QoE have addressed more aspects of UX for mobile multimedia applications but need be transformed into practical measures. The challenge of optimizing UX remains adaptations to the resource constrains (e.g., network conditions, mobile device capabilities, and heterogeneous usage contexts) as well as meeting complicated user requirements (e.g., usage purposes and personal preferences). In this chapter, we investigate the existing important UX frameworks, compare their similarities and discuss some important features that fit in the mobile video service. Based on the previous research, we propose a simple UX framework for mobile video application by mapping a variety of influencing factors of UX upon a typical mobile video delivery system. Each component and its factors are explored with comprehensive literature reviews. The proposed framework may benefit in user-centred design of mobile video through taking a complete consideration of UX influences and in improvement of mobile videoservice quality by adjusting the values of certain factors to produce a positive user experience. It may also facilitate relative research in the way of locating important issues to study, clarifying research scopes, and setting up proper study procedures. We then review a great deal of research on UX measurement, including QoE metrics and QoE frameworks of mobile multimedia. Finally, we discuss how to achieve an optimal quality of user experience by focusing on the issues of various aspects of UX of mobile video. In the conclusion, we suggest some open issues for future study
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Viewport prediction is a crucial aspect of tile-based 360 video streaming
system. However, existing trajectory based methods lack of robustness, also
oversimplify the process of information construction and fusion between
different modality inputs, leading to the error accumulation problem. In this
paper, we propose a tile classification based viewport prediction method with
Multi-modal Fusion Transformer, namely MFTR. Specifically, MFTR utilizes
transformer-based networks to extract the long-range dependencies within each
modality, then mine intra- and inter-modality relations to capture the combined
impact of user historical inputs and video contents on future viewport
selection. In addition, MFTR categorizes future tiles into two categories: user
interested or not, and selects future viewport as the region that contains most
user interested tiles. Comparing with predicting head trajectories, choosing
future viewport based on tile's binary classification results exhibits better
robustness and interpretability. To evaluate our proposed MFTR, we conduct
extensive experiments on two widely used PVS-HM and Xu-Gaze dataset. MFTR shows
superior performance over state-of-the-art methods in terms of average
prediction accuracy and overlap ratio, also presents competitive computation
efficiency.Comment: This paper is accepted by ACM-MM 202
- …