4,505 research outputs found

    Shooting the lecture scene using computer-controlled cameras based on situation understanding and evaluation of video images

    Full text link
    In this paper, we propose a computer-controlled camera work that shoots object scenes to model the professional cameramen’s work and selects the best image among plu-ral video images as a switcher. We apply this system to a shooting of a lecture scene. In the first, our system es-timates a teacher’s action based on features of a teacher and a blackboard. In the next, each camera is directed to a shooting area based on the teacher’s action, automatically. In the last, this system selects the best image among plural images under the evaluation rule. Moreover, we have tried experiments of shooting lecture scene and have confirmed the effectiveness of our approach. 1

    RGB-D-based Action Recognition Datasets: A Survey

    Get PDF
    Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

    New metric products, movies and 3D models from old stereopairs and their application to the in situ palaeontological site of Ambrona

    Get PDF
    [ES] Este artículo está basado en la información del siguiente proyecto:● LDGP_mem_006-1: "[S_Ambrona_Insitu] Levantamiento fotogramétrico del yacimiento paleontológico “Museo in situ” de Ambrona (Soria)", http://hdl.handle.net/10810/7353● LDGP_mem_006-1: "[S_Ambrona_Insitu] Levantamiento fotogramétrico del yacimiento paleontológico “Museo in situ” de Ambrona (Soria)", http://hdl.handle.net/10810/7353[EN] This paper is based on the information gathered in the following project:[EN] 3D modelling tools from photographic pictures have experienced significant improvements in the last years. One of the most outstanding changes is the spread of the photogrammetric systems based on algorithms referred to as Structure from Motion (SfM) in contrast with the traditional stereoscopic pairs. Nevertheless, the availability of important collections of stereoscopic registers collected during past decades invites us to explore the possibilities for re-using these photographs in order to generate new multimedia products, especially due to the fact that many of the documented elements have been largely altered or even disappeared. This article analyses an example of application to the re-use of a collection of photographs from the palaeontological site of Ambrona (Soria, Spain). More specifically, different pieces of software based on Structure from Motion (SfM) algorithms for the generation of 3D models with photographic textures are tested and some derived products such as orthoimages, video or applications of Augmented Reality (AR) are presented.[ES] Las herramientas de modelado 3D a partir de imágenes fotográficas han experimentado avances muy significativos en los últimos años. Uno de los más destacados corresponde a la generalización de los sistemas fotogramétricos basados en los algoritmos denominados Structure from Motion (SfM) sobre los proyectos de documentación tradicional basados en pares estereoscópicos. La existencia de importantes colecciones de registros estereoscópicos realizados durante las décadas anteriores invita a explorar las posibilidades de reutilización de estos registros para la obtención de productos multimedia actuales, máxime cuando algunos de los elementos documentados han sufrido grandes modificaciones o incluso desaparecido. En el presente artículo se analiza la reutilización de colecciones fotográficas de yacimientos paleontológicos mediante un ejemplo centrado en el yacimiento de Ambrona (Soria, España). En concreto, se contrastan varios programas basados en los algoritmos denominados Structure from Motion (SfM) para la generación del modelo 3D con textura y otros productos derivados como ortoimágenes, vídeos o aplicaciones de Realidad Aumentada (RA)

    Virtual Reality to Simulate Visual Tasks for Robotic Systems

    Get PDF
    Virtual reality (VR) can be used as a tool to analyze the interactions between the visual system of a robotic agent and the environment, with the aim of designing the algorithms to solve the visual tasks necessary to properly behave into the 3D world. The novelty of our approach lies in the use of the VR as a tool to simulate the behavior of vision systems. The visual system of a robot (e.g., an autonomous vehicle, an active vision system, or a driving assistance system) and its interplay with the environment can be modeled through the geometrical relationships between the virtual stereo cameras and the virtual 3D world. Differently from conventional applications, where VR is used for the perceptual rendering of the visual information to a human observer, in the proposed approach, a virtual world is rendered to simulate the actual projections on the cameras of a robotic system. In this way, machine vision algorithms can be quantitatively validated by using the ground truth data provided by the knowledge of both the structure of the environment and the vision system

    Interactive Fiction in Cinematic Virtual Reality: Epistemology, Creation and Evaluation

    Get PDF
    This dissertation presents the Interactive Fiction in Cinematic Virtual Reality (IFcVR), an interactive digital narrative (IDN) that brings together the cinematic virtual reality (cVR) and the creation of virtual environments through 360\ub0 video within an interactive fiction (IF) structure. This work is structured in three components: an epistemological approach to this kind of narrative and media hybrid; the creation process of IFcVR, from development to postproduction; and user evaluation of IFcVR. In order to set the foundations for the creation of interactive VR fiction films, I dissect the IFcVR by investigating the aesthetics, narratological and interactive notions that converge and diverge in it, proposing a medium-conscious narratology for this kind of artefact. This analysis led to the production of an IFcVR functional prototype: \u201cZENA\u201d, the first interactive VR film shot in Genoa. ZENA\u2019s creation process is reported proposing some guidelines for interactive and immersive film-makers. In order to evaluate the effectiveness of the IFcVR as an entertaining narrative form and a vehicle for diverse types of messages, this study also proposes a methodology to measure User Experience (UX) on IFcVR. The full evaluation protocol gathers both qualitative and quantitative data through ad hoc instruments. The proposed protocol is illustrated through its pilot application on ZENA. Findings show interactors' positive acceptance of IFcVR as an entertaining experience

    Understanding and designing for control in camera operation

    Get PDF
    Kameraleute nutzen traditionell gezielt Hilfsmittel um kontrollierte Kamerabewegungen zu ermöglichen. Der technische Fortschritt hat hierbei unlängst zum Entstehen neuer Werkzeugen wie Gimbals, Drohnen oder Robotern beigetragen. Dabei wurden durch eine Kombination von Motorisierung, Computer-Vision und Machine-Learning auch neue Interaktionstechniken eingeführt. Neben dem etablierten achsenbasierten Stil wurde nun auch ein inhaltsbasierter Interaktionsstil ermöglicht. Einerseits vereinfachte dieser die Arbeit, andererseits aber folgten dieser (Teil-)Automatisierung auch unerwünschte Nebeneffekte. Grundsätzlich wollen sich Kameraleute während der Kamerabewegung kontinuierlich in Kontrolle und am Ende als Autoren der Aufnahmen fühlen. Während Automatisierung hierbei Experten unterstützen und Anfänger befähigen kann, führt sie unweigerlich auch zu einem gewissen Verlust an gewünschter Kontrolle. Wenn wir Kamerabewegung mit neuen Werkzeugen unterstützen wollen, stellt sich uns daher die Frage: Wie sollten wir diese Werkzeuge gestalten damit sie, trotz fortschreitender Automatisierung ein Gefühl von Kontrolle vermitteln? In der Vergangenheit wurde Kamerakontrolle bereits eingehend erforscht, allerdings vermehrt im virtuellen Raum. Die Anwendung inhaltsbasierter Kontrolle im physikalischen Raum trifft jedoch auf weniger erforschte domänenspezifische Herausforderungen welche gleichzeitig auch neue Gestaltungsmöglichkeiten eröffnen. Um dabei auf Nutzerbedürfnisse einzugehen, müssen sich Schnittstellen zum Beispiel an diese Einschränkungen anpassen können und ein Zusammenspiel mit bestehenden Praktiken erlauben. Bisherige Forschung fokussierte sich oftmals auf ein technisches Verständnis von Kamerafahrten, was sich auch in der Schnittstellengestaltung niederschlug. Im Gegensatz dazu trägt diese Arbeit zu einem besseren Verständnis der Motive und Praktiken von Kameraleuten bei und bildet eine Grundlage zur Forschung und Gestaltung von Nutzerschnittstellen. Diese Arbeit präsentiert dazu konkret drei Beiträge: Zuerst beschreiben wir ethnographische Studien über Experten und deren Praktiken. Sie zeigen vor allem die Herausforderungen von Automatisierung bei Kreativaufgaben auf (Assistenz vs. Kontrollgefühl). Zweitens, stellen wir ein Prototyping-Toolkit vor, dass für den Einsatz im Feld geeignet ist. Das Toolkit stellt Software für eine Replikation quelloffen bereit und erleichtert somit die Exploration von Designprototypen. Um Fragen zu deren Gestaltung besser beantworten zu können, stellen wir ebenfalls ein Evaluations-Framework vor, das vor allem Kontrollqualität und -gefühl bestimmt. Darin erweitern wir etablierte Ansätze um eine neurowissenschaftliche Methodik, um Daten explizit wie implizit erheben zu können. Drittens, präsentieren wir Designs und deren Evaluation aufbauend auf unserem Toolkit und Framework. Die Alternativen untersuchen Kontrolle bei verschiedenen Automatisierungsgraden und inhaltsbasierten Interaktionen. Auftretende Verdeckung durch graphische Elemente, wurde dabei durch visuelle Reduzierung und Mid-Air Gesten kompensiert. Unsere Studien implizieren hohe Grade an Kontrollqualität und -gefühl bei unseren Ansätzen, die zudem kreatives Arbeiten und bestehende Praktiken unterstützen.Cinematographers often use supportive tools to craft desired camera moves. Recent technological advances added new tools to the palette such as gimbals, drones or robots. The combination of motor-driven actuation, computer vision and machine learning in such systems also rendered new interaction techniques possible. In particular, a content-based interaction style was introduced in addition to the established axis-based style. On the one hand, content-based cocreation between humans and automated systems made it easier to reach high level goals. On the other hand however, the increased use of automation also introduced negative side effects. Creatives usually want to feel in control during executing the camera motion and in the end as the authors of the recorded shots. While automation can assist experts or enable novices, it unfortunately also takes away desired control from operators. Thus, if we want to support cinematographers with new tools and interaction techniques the following question arises: How should we design interfaces for camera motion control that, despite being increasingly automated, provide cinematographers with an experience of control? Camera control has been studied for decades, especially in virtual environments. Applying content-based interaction to physical environments opens up new design opportunities but also faces, less researched, domain-specific challenges. To suit the needs of cinematographers, designs need to be crafted with care. In particular, they must adapt to constraints of recordings on location. This makes an interplay with established practices essential. Previous work has mainly focused on a technology-centered understanding of camera travel which consequently influenced the design of camera control systems. In contrast, this thesis, contributes to the understanding of the motives of cinematographers, how they operate on set and provides a user-centered foundation informing cinematography specific research and design. The contribution of this thesis is threefold: First, we present ethnographic studies on expert users and their shooting practices on location. These studies highlight the challenges of introducing automation to a creative task (assistance vs feeling in control). Second, we report on a domain specific prototyping toolkit for in-situ deployment. The toolkit provides open source software for low cost replication enabling the exploration of design alternatives. To better inform design decisions, we further introduce an evaluation framework for estimating the resulting quality and sense of control. By extending established methodologies with a recent neuroscientific technique, it provides data on explicit as well as implicit levels and is designed to be applicable to other domains of HCI. Third, we present evaluations of designs based on our toolkit and framework. We explored a dynamic interplay of manual control with various degrees of automation. Further, we examined different content-based interaction styles. Here, occlusion due to graphical elements was found and addressed by exploring visual reduction strategies and mid-air gestures. Our studies demonstrate that high degrees of quality and sense of control are achievable with our tools that also support creativity and established practices

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video
    corecore