6,935 research outputs found

    Unsupervised Learning from Narrated Instruction Videos

    Full text link
    We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.Comment: Appears in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 21 page

    Developmental Stages of Perception and Language Acquisition in a Perceptually Grounded Robot

    Get PDF
    The objective of this research is to develop a system for language learning based on a minimum of pre-wired language-specific functionality, that is compatible with observations of perceptual and language capabilities in the human developmental trajectory. In the proposed system, meaning (in terms of descriptions of events and spatial relations) is extracted from video images based on detection of position, motion, physical contact and their parameters. Mapping of sentence form to meaning is performed by learning grammatical constructions that are retrieved from a construction inventory based on the constellation of closed class items uniquely identifying the target sentence structure. The resulting system displays robust acquisition behavior that reproduces certain observations from developmental studies, with very modest “innate” language specificity

    Movie Description

    Get PDF
    Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015

    Movie101: A New Movie Understanding Benchmark

    Full text link
    To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for automatic systems to meet the needs of real application scenarios. To narrow this gap, we construct a large-scale Chinese movie benchmark, named Movie101. Closer to real scenarios, the Movie Clip Narrating (MCN) task in our benchmark asks models to generate role-aware narration paragraphs for complete movie clips where no actors are speaking. External knowledge, such as role information and movie genres, is also provided for better movie understanding. Besides, we propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation, which achieves the best correlation with human evaluation. Our benchmark also supports the Temporal Narration Grounding (TNG) task to investigate clip localization given text descriptions. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines. The dataset and codes are released at https://github.com/yuezih/Movie101.Comment: Accepted to ACL 202

    How are film endings shaped by their socio-historical context? Part 2

    Get PDF
    This article explores the aspect of filmic narratolgy that has been neglected for a long time in cinema and media studies: endings. Richard Neupert's The End - Narration and Closure in the Cinema (1995), a rare work on this topic, is examined, and its theory tested on Picnic at Hanging Rock (Peter Weir, 1975), a film that does not easily fit Neupert's framework. This film has raised controversial views about whether it has an open or a closed ending. Trying to shade light on this debate Picnic at Hanging Rock is examined a second time by proposing a new model that relates the ending to the context the film was made in

    Open Access Metadata for Journals in Directory of Open Access Journals: Who, How, and What Scheme?

    Get PDF
    Open access (OA) is a form of publication that allows some level of free access to scholarly publications. The Directory of Open Access Journals (DOAJ) is a repository to which OA journals may apply and upload content to increase discoverability. OA also refers to metadata that is freely available for harvesting. In making metadata open access, standards for schemes and protocols are needed to facilitate interoperability. For open access journals, such as those listed in the DOAJ, providing open access metadata in a form that promotes interoperability is essential for discoverability of their content. This paper investigates what standards exist or are emerging, who within journals is creating the metadata for DOAJ journals, and how are those journals and DOAJ sharing the metadata for articles. Moreover, since creating metadata requires specialized knowledge of both librarians and programmers, it is imperative that journals wanting to publish with OA metadata formulate plans to coordinate these experts and to be sure their efforts are compatible with current standards and protocols
    • …
    corecore