8 research outputs found

    Unsupervised Learning from Narrated Instruction Videos

    Full text link
    We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.Comment: Appears in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 21 page

    Finding Actors and Actions in Movies

    Get PDF
    International audienceWe address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in feature length movies Casablanca and American Beauty

    Learning from narrated instruction videos

    Get PDF
    International audienceAutomatic assistants could guide a person or a robot in performing new tasks, such as changing a car tire or repotting a plant. Creating such assistants, however, is non-trivial and requires understanding of visual and verbal content of a video. Towards this goal, we here address the problem of automatically learning the main steps of a task from a set of narrated instruction videos. We develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method sequentially clusters textual and visual representations of a task, where the two clustering problems are linked by joint constraints to obtain a single coherent sequence of steps in both modalities. To evaluate our method, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains videos for five different tasks with complex interactions between people and objects, captured in a variety of indoor and outdoor settings. We experimentally demonstrate that the proposed method can automatically discover, learn and localize the main steps of a task in input videos

    Weakly Supervised Action Labeling in Videos Under Ordering Constraints

    Get PDF
    Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.

    Das Dekarbonisierungspotenzial Von Pensionskassenportfolios (The Decarbonization Potential of Pension Insurance Fund Portfolios)

    No full text

    Mortality of treated HIV-1 positive individuals according to viral subtype in Europe and Canada: collaborative cohort analysis

    No full text
    Objectives:To estimate prognosis by viral subtype in HIV-1-infected individuals from start of antiretroviral therapy (ART) and after viral failure.Design:Collaborative analysis of data from eight European and three Canadian cohorts.Methods:Adults (N>20000) who started triple ART between 1996 and 2012 and had data on viral subtype were followed for mortality. We estimated crude and adjusted (for age, sex, regimen, CD4(+) cell count, and AIDS at baseline, period of starting ART, stratified by cohort, region of origin and risk group) mortality hazard ratios (MHR) by subtype. We estimated MHR subsequent to viral failure defined as two HIV-RNA measurements greater than 500 copies/ml after achieving viral suppression.Results:The most prevalent subtypes were B (15419; 74%), C (2091; 10%), CRF02AG (1057; 5%), A (873; 4%), CRF01AE (506; 2.4%), G (359; 1.7%), and D (232; 1.1%). Subtypes were strongly patterned by region of origin and risk group. During 104649 person-years of observation, 1172/20784 patients died. Compared with subtype B, mortality was higher for subtype A, but similar for all other subtypes. MHR for A versus B were 1.13 (95% confidence interval 0.85,1.50) when stratified by cohort, increased to 1.78 (1.27,2.51) on stratification by region and risk, and attenuated to 1.59 (1.14,2.23) on adjustment for covariates. MHR for A versus B was 2.65 (1.64,4.28) and 0.95 (0.57,1.57) for patients who started ART with CD4(+) cell count below, or more than, 100 cells/l, respectively. There was no difference in mortality between subtypes A, B and C after viral failure.Conclusion:Patients with subtype A had worse prognosis, an observation which may be confounded by socio-demographic factors. Copyright (C) 2016 Wolters Kluwer Health, Inc. All rights reserved
    corecore