6 research outputs found

    Learning Online Smooth Predictors for Realtime Camera Planning using Recurrent Decision Trees

    Get PDF
    We study the problem of online prediction for realtime camera planning, where the goal is to predict smooth trajectories that correctly track and frame objects of interest (e.g., players in a basketball game). The conventional approach for training predictors does not directly consider temporal consistency, and often produces undesirable jitter. Although post-hoc smoothing (e.g., via a Kalman filter) can mitigate this issue to some degree, it is not ideal due to overly stringent modeling assumptions (e.g., Gaussian noise). We propose a recurrent decision tree framework that can directly incorporate temporal consistency into a data-driven predictor, as well as a learning algorithm that can efficiently learn such temporally smooth models. Our approach does not require any post-processing, making online smooth predictions much easier to generate when the noise model is unknown. We apply our approach to sports broadcasting: given noisy player detections, we learn where the camera should look based on human demonstrations. Our experiments exhibit significant improvements over conventional baselines and showcase the practicality of our approach

    Learning Online Smooth Predictors for Realtime Camera Planning using Recurrent Decision Trees

    Get PDF
    We study the problem of online prediction for realtime camera planning, where the goal is to predict smooth trajectories that correctly track and frame objects of interest (e.g., players in a basketball game). The conventional approach for training predictors does not directly consider temporal consistency, and often produces undesirable jitter. Although post-hoc smoothing (e.g., via a Kalman filter) can mitigate this issue to some degree, it is not ideal due to overly stringent modeling assumptions (e.g., Gaussian noise). We propose a recurrent decision tree framework that can directly incorporate temporal consistency into a data-driven predictor, as well as a learning algorithm that can efficiently learn such temporally smooth models. Our approach does not require any post-processing, making online smooth predictions much easier to generate when the noise model is unknown. We apply our approach to sports broadcasting: given noisy player detections, we learn where the camera should look based on human demonstrations. Our experiments exhibit significant improvements over conventional baselines and showcase the practicality of our approach

    Multi-Clip Video Editing from a Single Viewpoint

    Get PDF
    International audienceWe propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer vision techniques, our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software. We demonstrate our approach on five video sequences of a live theatre performance by generating multiple synchronized subclips for each sequence

    Interactive zoom and panning from live Panoramic video

    No full text
    Panorama video is becoming increasingly popular, and we present an end-to-end real-time system to interactively zoom and pan into high-resolution panoramic videos. Compared to existing systems using perspective panoramas with cropping, our approach creates a cylindrical panorama. Here, the perspective is corrected in real-time, and the result is a better and more natural zoom. Our experimental results also indicate that such zoomed virtual views can be generated far below the frame-rate threshold. Taking into account recent trends in device development, our approach should be able to scale to a large number of concurrent users in the near future

    An efficient telemetry system for restoring sight

    Get PDF
    PhD ThesisThe human nervous system can be damaged as a result of disease or trauma, causing conditions such as Parkinson’s disease. Most people try pharmaceuticals as a primary method of treatment. However, drugs cannot restore some cases, such as visual disorder. Alternatively, this impairment can be treated with electronic neural prostheses. A retinal prosthesis is an example of that for restoring sight, but it is not efficient and only people with retinal pigmentosa benefit from it. In such treatments, stimulation of the nervous system can be achieved by electrical or optical means. In the latter case, the nerves need to be rendered light sensitive via genetic means (optogenetics). High radiance photonic devices are then required to deliver light to the target tissue. Such optical approaches hold the potential to be more effective while causing less harm to the brain tissue. As these devices are implanted in tissue, wireless means need to be used to communicate with them. For this, IEEE 802.15.6 or Bluetooth protocols at 2.4GHz are potentially compatible with most advanced electronic devices, and are also safe and secure. Also, wireless power delivery can operate the implanted device. In this thesis, a fully wireless and efficient visual cortical stimulator was designed to restore the sight of the blind. This system is likely to address 40% of the causes of blindness. In general, the system can be divided into two parts, hardware and software. Hardware parts include a wireless power transfer design, the communication device, power management, a processor and the control unit, and the 3D design for assembly. The software part contains the image simplification, image compression, data encoding, pulse modulation, and the control system. Real-time video streaming is processed and sent over Bluetooth, and data are received by the LPC4330 six layer implanted board. After retrieving the compressed data, the processed data are again sent to the implanted electrode/optrode to stimulate the brain’s nerve cells

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences
    corecore