3,055 research outputs found

    Deep Reinforcement Learning with semi-expert distillation for autonomous UAV cinematography

    Get PDF
    Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized modern media production. Being rapidly deployable “flying cameras”, they can easily capture aesthetically pleasing aerial footage of static or moving filming targets/subjects. Current approaches rely either on manual UAV/gimbal control by human experts or on a combination of complex computer vision algorithms and hardware configurations for automating the flight+flying process. This paper explores an efficient Deep Reinforcement Learning (DRL) alternative, which implicitly merges the target detection and path planning steps into a single algorithm. To achieve this, a baseline DRL approach is augmented with a novel policy distillation component, which transfers knowledge from a suitable, semi-expert Model Predictive Control (MPC) controller into the DRL agent. Thus, the latter is able to autonomously execute a specific UAV cinematography task with purely visual input. Unlike the MPC controller, the proposed DRL agent does not need to know the 3D world position of the filming target during inference. Experiments conducted in a photorealistic simulator showcase superior performance and training speed compared to the baseline agent while surpassing the MPC controller in terms of visual occlusion avoidance

    Optimal Multi-UAV Trajectory Planning for Filming Applications

    Get PDF
    Teams of multiple Unmanned Aerial Vehicles (UAVs) can be used to record large-scale outdoor scenarios and complementary views of several action points as a promising system for cinematic video recording. Generating the trajectories of the UAVs plays a key role, as it should be ensured that they comply with requirements for system dynamics, smoothness, and safety. The rise of numerical methods for nonlinear optimization is finding a ourishing field in optimization-based approaches to multi- UAV trajectory planning. In particular, these methods are rather promising for video recording applications, as they enable multiple constraints and objectives to be formulated, such as trajectory smoothness, compliance with UAV and camera dynamics, avoidance of obstacles and inter-UAV con icts, and mutual UAV visibility. The main objective of this thesis is to plan online trajectories for multi-UAV teams in video applications, formulating novel optimization problems and solving them in real time. The thesis begins by presenting a framework for carrying out autonomous cinematography missions with a team of UAVs. This framework enables media directors to design missions involving different types of shots with one or multiple cameras, running sequentially or concurrently. Second, the thesis proposes a novel non-linear formulation for the challenging problem of computing optimal multi-UAV trajectories for cinematography, integrating UAV dynamics and collision avoidance constraints, together with cinematographic aspects such as smoothness, gimbal mechanical limits, and mutual camera visibility. Lastly, the thesis describes a method for autonomous aerial recording with distributed lighting by a team of UAVs. The multi-UAV trajectory optimization problem is decoupled into two steps in order to tackle non-linear cinematographic aspects and obstacle avoidance at separate stages. This allows the trajectory planner to perform in real time and to react online to changes in dynamic environments. It is important to note that all the methods in the thesis have been validated by means of extensive simulations and field experiments. Moreover, all the software components have been developed as open source.Los equipos de vehículos aéreos no tripulados (UAV) son sistemas prometedores para grabar eventos cinematográficos, en escenarios exteriores de grandes dimensiones difíciles de cubrir o para tomar vistas complementarias de diferentes puntos de acción. La generación de trayectorias para este tipo de vehículos desempeña un papel fundamental, ya que debe garantizarse que se cumplan requisitos dinámicos, de suavidad y de seguridad. Los enfoques basados en la optimización para la planificación de trayectorias de múltiples UAVs se pueden ver beneficiados por el auge de los métodos numéricos para la resolución de problemas de optimización no lineales. En particular, estos métodos son bastante prometedores para las aplicaciones de grabación de vídeo, ya que permiten formular múltiples restricciones y objetivos, como la suavidad de la trayectoria, el cumplimiento de la dinámica del UAV y de la cámara, la evitación de obstáculos y de conflictos entre UAVs, y la visibilidad mutua. El objetivo principal de esta tesis es planificar trayectorias para equipos multi-UAV en aplicaciones de vídeo, formulando novedosos problemas de optimización y resolviéndolos en tiempo real. La tesis comienza presentando un marco de trabajo para la realización de misiones cinematográficas autónomas con un equipo de UAVs. Este marco permite a los directores de medios de comunicación diseñar misiones que incluyan diferentes tipos de tomas con una o varias cámaras, ejecutadas de forma secuencial o concurrente. En segundo lugar, la tesis propone una novedosa formulación no lineal para el difícil problema de calcular las trayectorias óptimas de los vehículos aéreos no tripulados en cinematografía, integrando en el problema la dinámica de los UAVs y las restricciones para evitar colisiones, junto con aspectos cinematográficos como la suavidad, los límites mecánicos del cardán y la visibilidad mutua de las cámaras. Por último, la tesis describe un método de grabación aérea autónoma con iluminación distribuida por un equipo de UAVs. El problema de optimización de trayectorias se desacopla en dos pasos para abordar los aspectos cinematográficos no lineales y la evitación de obstáculos en etapas separadas. Esto permite al planificador de trayectorias actuar en tiempo real y reaccionar en línea a los cambios en los entornos dinámicos. Es importante señalar que todos los métodos de la tesis han sido validados mediante extensas simulaciones y experimentos de campo. Además, todos los componentes del software se han desarrollado como código abierto

    Influence of Directional Sound Cues on Users'' Exploration across 360° Movie Cuts

    Get PDF
    Virtual reality (VR) is a powerful medium for 360° 360 storytelling, yet content creators are still in the process of developing cinematographic rules for effectively communicating stories in VR. Traditional cinematography has relied for over a century on well-established techniques for editing, and one of the most recurrent resources for this are cinematic cuts that allow content creators to seamlessly transition between scenes. One fundamental assumption of these techniques is that the content creator can control the camera; however, this assumption breaks in VR: Users are free to explore 360° 360 around them. Recent works have studied the effectiveness of different cuts in 360° 360 content, but the effect of directional sound cues while experiencing these cuts has been less explored. In this work, we provide the first systematic analysis of the influence of directional sound cues in users'' behavior across 360° 360 movie cuts, providing insights that can have an impact on deriving conventions for VR storytelling. © 1981-2012 IEEE

    Perceived Depth Control in Stereoscopic Cinematography

    Get PDF
    Despite the recent explosion of interest in the stereoscopic 3D (S3D) technology, the ultimate prevailing of the S3D medium is still significantly hindered by adverse effects regarding the S3D viewing discomfort. This thesis attempts to improve the S3D viewing experience by investigating perceived depth control methods in stereoscopic cinematography on desktop 3D displays. The main contributions of this work are: (1) A new method was developed to carry out human factors studies on identifying the practical limits of the 3D Comfort Zone on a given 3D display. Our results suggest that it is necessary for cinematographers to identify the specific limits of 3D Comfort Zone on the target 3D display as different 3D systems have different ranges for the 3D Comfort Zone. (2) A new dynamic depth mapping approach was proposed to improve the depth perception in stereoscopic cinematography. The results of a human-based experiment confirmed its advantages in controlling the perceived depth in viewing 3D motion pictures over the existing depth mapping methods. (3) The practicability of employing the Depth of Field (DoF) blur technique in S3D was also investigated. Our results indicate that applying the DoF blur simulation on stereoscopic content may not improve the S3D viewing experience without the real time information about what the viewer is looking at. Finally, a basic guideline for stereoscopic cinematography was introduced to summarise the new findings of this thesis alongside several well-known key factors in 3D cinematography. It is our assumption that this guideline will be of particular interest not only to 3D filmmaking but also to 3D gaming, sports broadcasting, and TV production

    From ‘hands up’ to ‘hands on’: harnessing the kinaesthetic potential of educational gaming

    Get PDF
    Traditional approaches to distance learning and the student learning journey have focused on closing the gap between the experience of off-campus students and their on-campus peers. While many initiatives have sought to embed a sense of community, create virtual learning environments and even build collaborative spaces for team-based assessment and presentations, they are limited by technological innovation in terms of the types of learning styles they support and develop. Mainstream gaming development – such as with the Xbox Kinect and Nintendo Wii – have a strong element of kinaesthetic learning from early attempts to simulate impact, recoil, velocity and other environmental factors to the more sophisticated movement-based games which create a sense of almost total immersion and allow untethered (in a technical sense) interaction with the games’ objects, characters and other players. Likewise, gamification of learning has become a critical focus for the engagement of learners and its commercialisation, especially through products such as the Wii Fit. As this technology matures, there are strong opportunities for universities to utilise gaming consoles to embed levels of kinaesthetic learning into the student experience – a learning style which has been largely neglected in the distance education sector. This paper will explore the potential impact of these technologies, to broadly imagine the possibilities for future innovation in higher education

    Towards assisting the decision-making process for content creators in cinematic virtual reality through the analysis of movie cuts and their influence on viewers' behavior

    Get PDF
    Virtual Reality (VR) is gaining popularity in recent years due to the commercialization of personal devices. VR is a new and exciting medium to tell stories, however, the development of Cinematic Virtual Reality (CVR) content is still in an exploratory phase. One of the main reasons is that in this medium the user has now total or partial control of the camera, therefore viewers create their own personal experiences by deciding what to see in every moment, which can potentially hinder the delivery of a pre-established narrative. In the particular case of transitions from one shot to another (movie cuts), viewers may not be aligned with the main elements of the scene placed by the content creator to convey the story. This can result in viewers missing key elements of the narrative. In this work, we explore recent studies that analyze viewers’ behavior during cinematic cuts in VR videos, and we discuss guidelines and methods which can help filmmakers with the decision-making process when filming and editing their movies

    Real-time refocusing using an FPGA-based standard plenoptic camera

    Get PDF
    Plenoptic cameras are receiving increased attention in scientific and commercial applications because they capture the entire structure of light in a scene, enabling optical transforms (such as focusing) to be applied computationally after the fact, rather than once and for all at the time a picture is taken. In many settings, real-time inter active performance is also desired, which in turn requires significant computational power due to the large amount of data required to represent a plenoptic image. Although GPUs have been shown to provide acceptable performance for real-time plenoptic rendering, their cost and power requirements make them prohibitive for embedded uses (such as in-camera). On the other hand, the computation to accomplish plenoptic rendering is well structured, suggesting the use of specialized hardware. Accordingly, this paper presents an array of switch-driven finite impulse response filters, implemented with FPGA to accomplish high-throughput spatial-domain rendering. The proposed architecture provides a power-efficient rendering hardware design suitable for full-video applications as required in broadcasting or cinematography. A benchmark assessment of the proposed hardware implementation shows that real-time performance can readily be achieved, with a one order of magnitude performance improvement over a GPU implementation and three orders ofmagnitude performance improvement over a general-purpose CPU implementation
    corecore