14 research outputs found

    Contextual bandit learning-based viewport prediction for 360 video

    Get PDF
    Accurately predicting where the user of a Virtual Reality (VR) application will be looking at in the near future improves the perceive quality of services, such as adaptive tile-based streaming or personalized online training. However, because of the unpredictability and dissimilarity of user behavior it is still a big challenge. In this work, we propose to use reinforcement learning, in particular contextual bandits, to solve this problem. The proposed solution tackles the prediction in two stages: (1) detection of movement; (2) prediction of direction. In order to prove its potential for VR services, the method was deployed on an adaptive tile-based VR streaming testbed, for benchmarking against a 3D trajectory extrapolation approach. Our results showed a significant improvement in terms of prediction error compared to the benchmark. This reduced prediction error also resulted in an enhancement on the perceived video quality

    Exploring New York in 8K : an adaptive tile-based virtual reality video streaming experience

    Get PDF
    Adapting and tiling the streaming of virtual reality (VR) video content has the potential to reduce the ultra-high bandwidth requirements of this type of multimedia services. Towards that goal, the optimization of a number of aspects is currently actively being researched. Novel rate adaptation heuristics, sophisticated viewport prediction algorithms and streaming protocol optimizations have proven their value to improve certain aspect of the VR streaming chain. However, the interplay between all these different optimizations as well as their tradeoff has not yet been explored in an experimental playground. The purpose of this demonstrator is to provide a full end-to-end adaptive tile-based VR video streaming system where each of the optimization aspects can be tuned with and their effect illustrated on-site

    Human-centric quality management of immersive multimedia applications

    Get PDF
    Augmented Reality (AR) and Virtual Reality (VR) multimodal systems are the latest trend within the field of multimedia. As they emulate the senses by means of omni-directional visuals, 360 degrees sound, motion tracking and touch simulation, they are able to create a strong feeling of presence and interaction with the virtual environment. These experiences can be applied for virtual training (Industry 4.0), tele-surgery (healthcare) or remote learning (education). However, given the strong time and task sensitiveness of these applications, it is of great importance to sustain the end-user quality, i.e. the Quality-of-Experience (QoE), at all times. Lack of synchronization and quality degradation need to be reduced to a minimum to avoid feelings of cybersickness or loss of immersiveness and concentration. This means that there is a need to shift the quality management from system-centered performance metrics towards a more human, QoE-centered approach. However, this requires for novel techniques in the three areas of the QoE-management loop (monitoring, modelling and control). This position paper identifies open areas of research to fully enable human-centric driven management of immersive multimedia. To this extent, four main dimensions are put forward: (1) Task and well-being driven subjective assessment; (2) Real-time QoE modelling; (3) Accurate viewport prediction; (4) Machine Learning (ML)-based quality optimization and content recreation. This paper discusses the state-of-the-art, and provides with possible solutions to tackle the open challenges

    Delivery of 360° videos in edge caching assisted wireless cellular networks

    Get PDF
    In recent years, 360° videos have become increasingly popular on commercial social platforms, and are a vital part of emerging Virtual Reality (VR) applications. However, the delivery of 360° videos requires significant bandwidth resources, which makes streaming of such data on mobile networks challenging. The bandwidth required for delivering 360° videos can be reduced by exploiting the fact that users are interested in viewing only a part of the video scene, the requested viewport. As different users may request different viewports, some parts of the 360° scenes may be more popular than others. 360° video delivery on mobile networks can be facilitated by caching popular content at edge servers, and delivering it from there to the users. However, existing edge caching schemes do not take full potential of the unequal popularity of different parts of a video, which renders them inefficient for caching 360° videos. Inspired by the above, in this thesis, we investigate how advanced 360° video coding tools, i.e., encoding into multiple quality layers and tiles, can be utilized to build more efficient wireless edge caching schemes for 360° videos. The above encoding allows the caching of only the parts of the 360° videos that are popular in high quality. To understand how edge caching schemes can benefit from 360° video coding, we compare the caching of 360° videos encoded into multiple quality layers and tiles with layer-agnostic and tile-agnostic schemes. To cope with the fact that the content popularity distribution may be unknown, we use machine learning techniques, for both Video on Demand (VoD), and live streaming scenarios. From our findings, it is clear that by taking into account the aforementioned 360° video characteristics leads to an increased performance in terms of the quality of the video delivered to the users, and the usage of the backhaul links

    Deep Exploration for Recommendation Systems

    Full text link
    Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms

    Tile-based edge caching for 360° live video streaming

    Get PDF
    360° video is becoming an increasingly popular technology on commercial social platforms and vital part of emerging Virtual Reality/Augmented Reality (VR/AR) applications. However, the delivery of 360° video content in mobile networks is challenging because of its size. The encoding of 360° video into multiple quality layers and tiles and edge cache-assisted video delivery have been proposed as a remedy to the excess bandwidth requirements of 360° video delivery systems. Existing works using the above tools have shown promising performance for Video-on-Demand (VoD) 360° delivery, but they cannot be straightforwardly extended in a live-streaming setup. Motivated by the above, we study edge cache-assisted 360° live video streaming to increase the overall quality of the delivered 360° videos to users and reduce the service cost. We employ Long Short-Term Memory (LSTM) networks to forecast the evolution of the content requests and prefetch content to caches. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple Small Base Stations (SBSs) are allowed to receive data from any of these SBSs. We evaluate and compare the performance of our algorithm with Least Frequently Used (LFU), Least Recently Used (LRU), and First In First Out (FIFO) algorithms. The results show the superiority of the proposed approach in terms of delivered video quality, cache-hit-ratio and backhaul link usage

    Video Caching, Analytics and Delivery at the Wireless Edge: A Survey and Future Directions

    Get PDF
    Future wireless networks will provide high bandwidth, low-latency, and ultra-reliable Internet connectivity to meet the requirements of different applications, ranging from mobile broadband to the Internet of Things. To this aim, mobile edge caching, computing, and communication (edge-C3) have emerged to bring network resources (i.e., bandwidth, storage, and computing) closer to end users. Edge-C3 allows improving the network resource utilization as well as the quality of experience (QoE) of end users. Recently, several video-oriented mobile applications (e.g., live content sharing, gaming, and augmented reality) have leveraged edge-C3 in diverse scenarios involving video streaming in both the downlink and the uplink. Hence, a large number of recent works have studied the implications of video analysis and streaming through edge-C3. This article presents an in-depth survey on video edge-C3 challenges and state-of-the-art solutions in next-generation wireless and mobile networks. Specifically, it includes: a tutorial on video streaming in mobile networks (e.g., video encoding and adaptive bitrate streaming); an overview of mobile network architectures, enabling technologies, and applications for video edge-C3; video edge computing and analytics in uplink scenarios (e.g., architectures, analytics, and applications); and video edge caching, computing and communication methods in downlink scenarios (e.g., collaborative, popularity-based, and context-aware). A new taxonomy for video edge-C3 is proposed and the major contributions of recent studies are first highlighted and then systematically compared. Finally, several open problems and key challenges for future research are outlined

    Understanding and designing for control in camera operation

    Get PDF
    Kameraleute nutzen traditionell gezielt Hilfsmittel um kontrollierte Kamerabewegungen zu ermöglichen. Der technische Fortschritt hat hierbei unlängst zum Entstehen neuer Werkzeugen wie Gimbals, Drohnen oder Robotern beigetragen. Dabei wurden durch eine Kombination von Motorisierung, Computer-Vision und Machine-Learning auch neue Interaktionstechniken eingeführt. Neben dem etablierten achsenbasierten Stil wurde nun auch ein inhaltsbasierter Interaktionsstil ermöglicht. Einerseits vereinfachte dieser die Arbeit, andererseits aber folgten dieser (Teil-)Automatisierung auch unerwünschte Nebeneffekte. Grundsätzlich wollen sich Kameraleute während der Kamerabewegung kontinuierlich in Kontrolle und am Ende als Autoren der Aufnahmen fühlen. Während Automatisierung hierbei Experten unterstützen und Anfänger befähigen kann, führt sie unweigerlich auch zu einem gewissen Verlust an gewünschter Kontrolle. Wenn wir Kamerabewegung mit neuen Werkzeugen unterstützen wollen, stellt sich uns daher die Frage: Wie sollten wir diese Werkzeuge gestalten damit sie, trotz fortschreitender Automatisierung ein Gefühl von Kontrolle vermitteln? In der Vergangenheit wurde Kamerakontrolle bereits eingehend erforscht, allerdings vermehrt im virtuellen Raum. Die Anwendung inhaltsbasierter Kontrolle im physikalischen Raum trifft jedoch auf weniger erforschte domänenspezifische Herausforderungen welche gleichzeitig auch neue Gestaltungsmöglichkeiten eröffnen. Um dabei auf Nutzerbedürfnisse einzugehen, müssen sich Schnittstellen zum Beispiel an diese Einschränkungen anpassen können und ein Zusammenspiel mit bestehenden Praktiken erlauben. Bisherige Forschung fokussierte sich oftmals auf ein technisches Verständnis von Kamerafahrten, was sich auch in der Schnittstellengestaltung niederschlug. Im Gegensatz dazu trägt diese Arbeit zu einem besseren Verständnis der Motive und Praktiken von Kameraleuten bei und bildet eine Grundlage zur Forschung und Gestaltung von Nutzerschnittstellen. Diese Arbeit präsentiert dazu konkret drei Beiträge: Zuerst beschreiben wir ethnographische Studien über Experten und deren Praktiken. Sie zeigen vor allem die Herausforderungen von Automatisierung bei Kreativaufgaben auf (Assistenz vs. Kontrollgefühl). Zweitens, stellen wir ein Prototyping-Toolkit vor, dass für den Einsatz im Feld geeignet ist. Das Toolkit stellt Software für eine Replikation quelloffen bereit und erleichtert somit die Exploration von Designprototypen. Um Fragen zu deren Gestaltung besser beantworten zu können, stellen wir ebenfalls ein Evaluations-Framework vor, das vor allem Kontrollqualität und -gefühl bestimmt. Darin erweitern wir etablierte Ansätze um eine neurowissenschaftliche Methodik, um Daten explizit wie implizit erheben zu können. Drittens, präsentieren wir Designs und deren Evaluation aufbauend auf unserem Toolkit und Framework. Die Alternativen untersuchen Kontrolle bei verschiedenen Automatisierungsgraden und inhaltsbasierten Interaktionen. Auftretende Verdeckung durch graphische Elemente, wurde dabei durch visuelle Reduzierung und Mid-Air Gesten kompensiert. Unsere Studien implizieren hohe Grade an Kontrollqualität und -gefühl bei unseren Ansätzen, die zudem kreatives Arbeiten und bestehende Praktiken unterstützen.Cinematographers often use supportive tools to craft desired camera moves. Recent technological advances added new tools to the palette such as gimbals, drones or robots. The combination of motor-driven actuation, computer vision and machine learning in such systems also rendered new interaction techniques possible. In particular, a content-based interaction style was introduced in addition to the established axis-based style. On the one hand, content-based cocreation between humans and automated systems made it easier to reach high level goals. On the other hand however, the increased use of automation also introduced negative side effects. Creatives usually want to feel in control during executing the camera motion and in the end as the authors of the recorded shots. While automation can assist experts or enable novices, it unfortunately also takes away desired control from operators. Thus, if we want to support cinematographers with new tools and interaction techniques the following question arises: How should we design interfaces for camera motion control that, despite being increasingly automated, provide cinematographers with an experience of control? Camera control has been studied for decades, especially in virtual environments. Applying content-based interaction to physical environments opens up new design opportunities but also faces, less researched, domain-specific challenges. To suit the needs of cinematographers, designs need to be crafted with care. In particular, they must adapt to constraints of recordings on location. This makes an interplay with established practices essential. Previous work has mainly focused on a technology-centered understanding of camera travel which consequently influenced the design of camera control systems. In contrast, this thesis, contributes to the understanding of the motives of cinematographers, how they operate on set and provides a user-centered foundation informing cinematography specific research and design. The contribution of this thesis is threefold: First, we present ethnographic studies on expert users and their shooting practices on location. These studies highlight the challenges of introducing automation to a creative task (assistance vs feeling in control). Second, we report on a domain specific prototyping toolkit for in-situ deployment. The toolkit provides open source software for low cost replication enabling the exploration of design alternatives. To better inform design decisions, we further introduce an evaluation framework for estimating the resulting quality and sense of control. By extending established methodologies with a recent neuroscientific technique, it provides data on explicit as well as implicit levels and is designed to be applicable to other domains of HCI. Third, we present evaluations of designs based on our toolkit and framework. We explored a dynamic interplay of manual control with various degrees of automation. Further, we examined different content-based interaction styles. Here, occlusion due to graphical elements was found and addressed by exploring visual reduction strategies and mid-air gestures. Our studies demonstrate that high degrees of quality and sense of control are achievable with our tools that also support creativity and established practices
    corecore