24 research outputs found

    A Virtual Cinematographer for Presenter Tracking in 4K Lecture Videos

    Get PDF
    Lecture recording has become an important part of the provision of accessible tertiary education and having good autonomous recording and processing systems is necessary to make it feasible. In this work, we develop and evaluate a video processing framework that uses 4K video to track the lecturer and frame him/her in a way that simulates a human camera operator. We also investigate general issues pertaining to blackboard usage and its influence on cinematography decisions. We found that post-processing produced better tracking and framing results when compared to some real-time approaches. Furthermore, the entire pipeline can run on a commodity PC and will complete within the suggested time of 300% of the input video length. In fact, our testing showed that 60% of the total processing time can be ascribed to I/O operations. With the removal of redundant reads and writes, this proportion can be reduced. Finally, some algorithms can be remapped to parallel versions which will exploit multicore CPUs or GPUs if these are available

    Automated lecture video recording, post-processing, and viewing system that utilizes multimodal inputs to provide a dynamic student experience

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (page 59).This thesis describes the design, implementation, and evaluation of EduCase: an inexpensive automated lecture video recording, post-processing, and viewing system. The EduCase recording system consists of three devices, one per lecture hall board. Each recording device records color, depth, skeletal, and audio inputs. The Post-Processor automatically processes the recordings to produce an output file usable by the Viewer, which provides a more dynamic student experience than traditional video playback systems. In particular, it allows students to flip back to view a previous board while the lecture continues to play in the background. It also allows students to toggle the professor's visibility in and out to see the board they might be blocking. The system was successfully evaluated in blackboard-heavy lectures at MIT and Harvard. We hope that EduCase will be the quickest, most inexpensive, and student-friendly lecture capture system, and contribute to our overarching goal of education for all.by Sara T. Itani.M. Eng

    Low Resource, Post-processed Lecture Recording from 4K Video Streams

    Get PDF
    Many universities are using lecture recording technology to expand the reach of their teaching programs, and to continue instruction when face to face lectures are not possi- ble. Increasingly, high-resolution 4K cameras are used, since they allow for easy reading of board/screen context. Unfortunately, while 4K cameras are now quite affordable, the back-end computing infrastructure to process and distribute a multitude of recorded 4K streams can be costly. Furthermore, the bandwidth requirements for a 4K stream are exorbitant - running to over 2GB for a 45-60 minute lecture. These factors mitigate against the use of such technology in a low-resource environment, and motivated our investigation into methods to reduce resource requirements for both the institution and students. We describe the design and implementation of a low resource 4K lecture recording solution, which addresses these problems through a computationally efficient video processing pipeline. The pipeline consists of a front-end, which segments presenter motion and writing/board surfaces from the stream and a back-end, which serves as a virtual cinematographer (VC), combining this contextual information to draw attention to the lecturer and relevant content. The bandwidth saving is realized by defining a smaller fixed-size, context-sensitive ‘cropping window’ and generating a new video from the crop regions. The front-end utilises computationally cheap temporal frame differencing at its core: this does not require expensive GPU hardware and also limits the memory required for processing. The VC receives a small set of motion/content bounding boxes and applies established framing heuristics to determine which region to extract from the full 4K frame. Performance results coupled to a user survey show that the system is fit for purpose: it is able to produce good presenter framing/context, over a range of challenging lecture venue layouts and lighting conditions within a time that is acceptable for lecture video processing

    Multi-Clip Video Editing from a Single Viewpoint

    Get PDF
    International audienceWe propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer vision techniques, our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software. We demonstrate our approach on five video sequences of a live theatre performance by generating multiple synchronized subclips for each sequence

    Movie Editing and Cognitive Event Segmentation in Virtual Reality Video

    Get PDF
    Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsically different from traditional movies in that the viewer controls the camera orientation at all times. As a consequence, common editing techniques that rely on camera orientations, zooms, etc., cannot be used. In this paper we investigate key relevant questions to understand how well traditional movie editing carries over to VR. To do so, we rely on recent cognition studies and the event segmentation theory, which states that our brains segment continuous actions into a series of discrete, meaningful events. We first replicate one of these studies to assess whether the predictions of such theory can be applied to VR. We next gather gaze data from viewers watching VR videos containing different edits with varying parameters, and provide the first systematic analysis of viewers' behavior and the perception of continuity in VR. From this analysis we make a series of relevant findings; for instance, our data suggests that predictions from the cognitive event segmentation theory are useful guides for VR editing; that different types of edits are equally well understood in terms of continuity; and that spatial misalignments between regions of interest at the edit boundaries favor a more exploratory behavior even after viewers have fixated on a new region of interest. In addition, we propose a number of metrics to describe viewers' attentional behavior in VR. We believe the insights derived from our work can be useful as guidelines for VR content creation

    Simulating a Smartboard by Real-Time Gesture Detection in Lecture Videos

    Full text link

    Human-Centered Webcasting of Interactive-Whiteboard Lectures

    Get PDF
    Abstract In our system for recording and transmitting lectures over the Internet, the board content is transmitted as vector graphics, producing thus a high quality image, while the video of the lecturer is sent as a separate stream. It is easy for the viewer to read the board but the lecturer appears in a separate window. As a result, two areas of the screen are competing for the viewer's attention, causing the widely known split attention effect. To eliminate this problem, the lecturer is extracted from the video stream and his or her image is pasted onto the board image at video stream rates. The lecturer can be dimmed from opaque to semitransparent, or even transparent. The article presents a detailed analysis of the underlying psychological problems and explains the multimedia techniques that are applied to achieve the solution
    corecore