85 research outputs found

    DynIBaR: Neural Dynamic Image-Based Rendering

    Full text link
    We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. State-of-the-art methods based on temporally varying Neural Radiance Fields (aka dynamic NeRFs) have shown impressive results on this task. However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications. Instead of encoding the entire dynamic scene within the weights of an MLP, we present a new approach that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner. Our system retains the advantages of prior methods in its ability to model complex scenes and view-dependent effects, but also enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings. Our project webpage is at dynibar.github.io.Comment: Project page: dynibar.github.i

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

    Video based dynamic scene analysis and multi-style abstraction.

    Get PDF
    Tao, Chenjun.Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.Includes bibliographical references (leaves 89-97).Abstracts in English and Chinese.Abstract --- p.iAcknowledgements --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Window-oriented Retargeting --- p.1Chapter 1.2 --- Abstraction Rendering --- p.4Chapter 1.3 --- Thesis Outline --- p.6Chapter 2 --- Related Work --- p.7Chapter 2.1 --- Video Migration --- p.8Chapter 2.2 --- Video Synopsis --- p.9Chapter 2.3 --- Periodic Motion --- p.14Chapter 2.4 --- Video Tracking --- p.14Chapter 2.5 --- Video Stabilization --- p.15Chapter 2.6 --- Video Completion --- p.20Chapter 3 --- Active Window Oriented Video Retargeting --- p.21Chapter 3.1 --- System Model --- p.21Chapter 3.1.1 --- Foreground Extraction --- p.23Chapter 3.1.2 --- Optimizing Active Windows --- p.27Chapter 3.1.3 --- Initialization --- p.29Chapter 3.2 --- Experiments --- p.32Chapter 3.3 --- Summary --- p.37Chapter 4 --- Multi-Style Abstract Image Rendering --- p.39Chapter 4.1 --- Abstract Images --- p.39Chapter 4.2 --- Multi-Style Abstract Image Rendering --- p.42Chapter 4.2.1 --- Multi-style Processing --- p.45Chapter 4.2.2 --- Layer-based Rendering --- p.46Chapter 4.2.3 --- Abstraction --- p.47Chapter 4.3 --- Experimental Results --- p.49Chapter 4.4 --- Summary --- p.56Chapter 5 --- Interactive Abstract Videos --- p.58Chapter 5.1 --- Abstract Videos --- p.58Chapter 5.2 --- Multi-Style Abstract Video --- p.59Chapter 5.2.1 --- Abstract Images --- p.60Chapter 5.2.2 --- Video Morphing --- p.65Chapter 5.2.3 --- Interactive System --- p.69Chapter 5.3 --- Interactive Videos --- p.76Chapter 5.4 --- Summary --- p.77Chapter 6 --- Conclusions --- p.81Chapter A --- List of Publications --- p.83Chapter B --- Optical flow --- p.84Chapter C --- Belief Propagation --- p.86Bibliography --- p.8

    Quality-aware Content Adaptation in Digital Video Streaming

    Get PDF
    User-generated video has attracted a lot of attention due to the success of Video Sharing Sites such as YouTube and Online Social Networks. Recently, a shift towards live consumption of these videos is observable. The content is captured and instantly shared over the Internet using smart mobile devices such as smartphones. Large-scale platforms arise such as YouTube.Live, YouNow or Facebook.Live which enable the smartphones of users to livestream to the public. These platforms achieve the distribution of tens of thousands of low resolution videos to remote viewers in parallel. Nonetheless, the providers are not capable to guarantee an efficient collection and distribution of high-quality video streams. As a result, the user experience is often degraded, and the needed infrastructure installments are huge. Efficient methods are required to cope with the increasing demand for these video streams; and an understanding is needed how to capture, process and distribute the videos to guarantee a high-quality experience for viewers. This thesis addresses the quality awareness of user-generated videos by leveraging the concept of content adaptation. Two types of content adaptation, the adaptive video streaming and the video composition, are discussed in this thesis. Then, a novel approach for the given scenario of a live upload from mobile devices, the processing of video streams and their distribution is presented. This thesis demonstrates that content adaptation applied to each step of this scenario, ranging from the upload to the consumption, can significantly improve the quality for the viewer. At the same time, if content adaptation is planned wisely, the data traffic can be reduced while keeping the quality for the viewers high. The first contribution of this thesis is a better understanding of the perceived quality in user-generated video and its influencing factors. Subjective studies are performed to understand what affects the human perception, leading to the first of their kind quality models. Developed quality models are used for the second contribution of this work: novel quality assessment algorithms. A unique attribute of these algorithms is the usage of multiple features from different sensors. Whereas classical video quality assessment algorithms focus on the visual information, the proposed algorithms reduce the runtime by an order of magnitude when using data from other sensors in video capturing devices. Still, the scalability for quality assessment is limited by executing algorithms on a single server. This is solved with the proposed placement and selection component. It allows the distribution of quality assessment tasks to mobile devices and thus increases the scalability of existing approaches by up to 33.71% when using the resources of only 15 mobile devices. These three contributions are required to provide a real-time understanding of the perceived quality of the video streams produced on mobile devices. The upload of video streams is the fourth contribution of this work. It relies on content and mechanism adaptation. The thesis introduces the first prototypically evaluated adaptive video upload protocol (LiViU) which transcodes multiple video representations in real-time and copes with changing network conditions. In addition, a mechanism adaptation is integrated into LiViU to react to changing application scenarios such as streaming high-quality videos to remote viewers or distributing video with a minimal delay to close-by recipients. A second type of content adaptation is discussed in the fifth contribution of this work. An automatic video composition application is presented which enables live composition from multiple user-generated video streams. The proposed application is the first of its kind, allowing the in-time composition of high-quality video streams by inspecting the quality of individual video streams, recording locations and cinematographic rules. As a last contribution, the content-aware adaptive distribution of video streams to mobile devices is introduced by the Video Adaptation Service (VAS). The VAS analyzes the video content streamed to understand which adaptations are most beneficial for a viewer. It maximizes the perceived quality for each video stream individually and at the same time tries to produce as little data traffic as possible - achieving data traffic reduction of more than 80%

    Exploring Sparse, Unstructured Video Collections of Places

    Get PDF
    The abundance of mobile devices and digital cameras with video capture makes it easy to obtain large collections of video clips that contain the same location, environment, or event. However, such an unstructured collection is difficult to comprehend and explore. We propose a system that analyses collections of unstructured but related video data to create a Videoscape: a data structure that enables interactive exploration of video collections by visually navigating — spatially and/or temporally — between different clips. We automatically identify transition opportunities, or portals. From these portals, we construct the Videoscape, a graph whose edges are video clips and whose nodes are portals between clips. Now structured, the videos can be interactively explored by walking the graph or by geographic map. Given this system, we gauge preference for different video transition styles in a user study, and generate heuristics that automatically choose an appropriate transition style. We evaluate our system using three further user studies, which allows us to conclude that Videoscapes provides significant benefits over related methods. Our system leads to previously unseen ways of interactive spatio-temporal exploration of casually captured videos, and we demonstrate this on several video collections

    Automatic Mobile Video Remixing and Collaborative Watching Systems

    Get PDF
    In the thesis, the implications of combining collaboration with automation for remix creation are analyzed. We first present a sensor-enhanced Automatic Video Remixing System (AVRS), which intelligently processes mobile videos in combination with mobile device sensor information. The sensor-enhanced AVRS system involves certain architectural choices, which meet the key system requirements (leverage user generated content, use sensor information, reduce end user burden), and user experience requirements. Architecture adaptations are required to improve certain key performance parameters. In addition, certain operating parameters need to be constrained, for real world deployment feasibility. Subsequently, sensor-less cloud based AVRS and low footprint sensorless AVRS approaches are presented. The three approaches exemplify the importance of operating parameter tradeoffs for system design. The approaches cover a wide spectrum, ranging from a multimodal multi-user client-server system (sensor-enhanced AVRS) to a mobile application which can automatically generate a multi-camera remix experience from a single video. Next, we present the findings from the four user studies involving 77 users related to automatic mobile video remixing. The goal was to validate selected system design goals, provide insights for additional features and identify the challenges and bottlenecks. Topics studied include the role of automation, the value of a video remix as an event memorabilia, the requirements for different types of events and the perceived user value from creating multi-camera remix from a single video. System design implications derived from the user studies are presented. Subsequently, sport summarization, which is a specific form of remix creation is analyzed. In particular, the role of content capture method is analyzed with two complementary approaches. The first approach performs saliency detection in casually captured mobile videos; in contrast, the second one creates multi-camera summaries from role based captured content. Furthermore, a method for interactive customization of summary is presented. Next, the discussion is extended to include the role of users’ situational context and the consumed content in facilitating collaborative watching experience. Mobile based collaborative watching architectures are described, which facilitate a common shared context between the participants. The concept of movable multimedia is introduced to highlight the multidevice environment of current day users. The thesis presents results which have been derived from end-to-end system prototypes tested in real world conditions and corroborated with extensive user impact evaluation

    Visual analysis and synthesis with physically grounded constraints

    Get PDF
    The past decade has witnessed remarkable progress in image-based, data-driven vision and graphics. However, existing approaches often treat the images as pure 2D signals and not as a 2D projection of the physical 3D world. As a result, a lot of training examples are required to cover sufficiently diverse appearances and inevitably suffer from limited generalization capability. In this thesis, I propose "inference-by-composition" approaches to overcome these limitations by modeling and interpreting visual signals in terms of physical surface, object, and scene. I show how we can incorporate physically grounded constraints such as scene-specific geometry in a non-parametric optimization framework for (1) revealing the missing parts of an image due to removal of a foreground or background element, (2) recovering high spatial frequency details that are not resolvable in low-resolution observations. I then extend the framework from 2D images to handle spatio-temporal visual data (videos). I demonstrate that we can convincingly fill spatio-temporal holes in a temporally coherent fashion by jointly reconstructing the appearance and motion. Compared to existing approaches, our technique can synthesize physically plausible contents even in challenging videos. For visual analysis, I apply stereo camera constraints for discovering multiple approximately linear structures in extremely noisy videos with an ecological application to bird migration monitoring at night. The resulting algorithms are simple and intuitive while achieving state-of-the-art performance without the need of training on an exhaustive set of visual examples

    See the Path: Using Laban\u27s Movement Scales to Address Spatial Awareness After Brain Injury

    Get PDF
    This single-participant quasi-experimental thesis study was an investigation of the implementation of Rudolf Laban’s dimensional and diagonal movement scales in individual dance/movement therapy sessions with an adult female recovering from brain injury. The main research question was “How will the implementation of Laban’s movement scales affect the participant’s general awareness of space?” The Santa Barbara Sense of Direction Scale was used as a pre-test and post-test to record the participant’s self-reports of her spatial awareness. Eleven sessions were video-recorded and the participant’s movement during the scales was observed and notated by the primary researcher and a hired analyst using Laban Movement Analysis. Since the participant had limited movement capacity, the observation parameters were focused on her eye movements; she was scored as successful or unsuccessful in seeing each point in space. Mean calculations of the scores assigned for each point revealed that the participant improved overall by 16% in her execution of the scales, which could indicate an increase in spatial awareness. Her score on the post-test indicated an improvement of 3% in her self-reports of her spatial awareness. The hypothesis developed from this research is that Laban’s movement scales may augment recovery from brain injury in a beneficial manner because the client’s spatial awareness may improve when the scales are executed regularly. This research provides evidence that dance/movement therapy interventions building on Laban’s theory of Movement Harmony are valid techniques to address spatial awareness. Additional outcomes of the study include the proposal of study protocols for use in further research studies of this type

    Guided Autonomy for Quadcopter Photography

    Get PDF
    Photographing small objects with a quadcopter is non-trivial to perform with many common user interfaces, especially when it requires maneuvering an Unmanned Aerial Vehicle (C) to difficult angles in order to shoot high perspectives. The aim of this research is to employ machine learning to support better user interfaces for quadcopter photography. Human Robot Interaction (HRI) is supported by visual servoing, a specialized vision system for real-time object detection, and control policies acquired through reinforcement learning (RL). Two investigations of guided autonomy were conducted. In the first, the user directed the quadcopter with a sketch based interface, and periods of user direction were interspersed with periods of autonomous flight. In the second, the user directs the quadcopter by taking a single photo with a handheld mobile device, and the quadcopter autonomously flies to the requested vantage point. This dissertation focuses on the following problems: 1) evaluating different user interface paradigms for dynamic photography in a GPS-denied environment; 2) learning better Convolutional Neural Network (CNN) object detection models to assure a higher precision in detecting human subjects than the currently available state-of-the-art fast models; 3) transferring learning from the Gazebo simulation into the real world; 4) learning robust control policies using deep reinforcement learning to maneuver the quadcopter to multiple shooting positions with minimal human interaction
    corecore