Search CORE

12 research outputs found

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Author: Bouritsas Giorgos
Koutras Petros
Maragos Petros
Zlatintsi Athanasia
Publication venue
Publication date: 04/04/2018
Field of study

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

A Dataset for Movie Description

Author: Rohrbach Anna
Rohrbach Marcus
Schiele Bernt
Tandon Niket
Publication venue
Publication date: 01/01/2015
Field of study

Descriptive video service (DVS) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed DVS, which is temporally aligned to full length HD movies. In addition we also collected the aligned movie scripts which have been used in prior work and compare the two different sources of descriptions. In total the Movie Description dataset contains a parallel corpus of over 54,000 sentences and video snippets from 72 HD movies. We characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing DVS to scripts, we find that DVS is far more visual and describes precisely what is shown rather than what should happen according to the scripts created prior to movie production

arXiv.org e-Print Archive

CiteSeerX

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Finding Actors and Actions in Movies

Author: Bach Francis
Bojanowski Piotr
Laptev Ivan
Ponce Jean
Schmid Cordelia
Sivic Josef
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

International audienceWe address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in feature length movies Casablanca and American Beauty

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Discriminative Graph-Based Parser for the Abstract Meaning Representation

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Crossref

Movie Description

Author: Courville Aaron
Larochelle Hugo
Pal Christopher
Rohrbach Anna
Rohrbach Marcus
Schiele Bernt
Tandon Niket
Torabi Atousa
Publication venue
Publication date: 12/05/2016
Field of study

Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

Springer - Publisher Connector

PolyPublie

MPG.PuRe

Estonian football specific corpora automatic semantic role labeling with football specific Framenet

Author: Tammeveski Lauri
Publication venue
Publication date: 01/01/2014
Field of study

Käesoleva töö eesmärgiks on uurida ning üritada lahendada eestikeelse teksti automaatse freimidega märgendamise probleemi. Üldine eestikeelne Framenet on alles algusjärgus, kuid olemas on terviklik jalgpalli-alane freimide ressurss, mille abil üritame tõestada hüpoteesi, et jalgpalli-alase teksti märgendamiseks piisab vaid morfoloogilisest ning süntaktilisest infost. Sellele hüpoteesile me siiski kinnitust ei saanud, kuna sama tähendust kandvat lauset on võimalik esitada liiga paljudel erinevatel viisidel. Lisaks täiendasime jalgpalli-alaste sõnadega Eesti suurimat leksikaal-semantilist andmebaasi, Wordnetti.Research and a possible solution to the problem of automatic semantic role labeling of text in Estonian is carried out in this paper. A general Estonian Framenet is in the starting phase, but there is also available a football specific Framenet. We try to prove the hypothesis that morphological and syntactical information is enough for automatic semantic role labeling in football related corpora. Unfortunately, we did not achieve a confirmation for the hypothesis, because there are too many ways to present sentences that have the same meaning. In addition, we supplemented Estonian biggest lexical-syntactic database with football related words

DSpace at Tartu University Library

An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints

Author: Andre F.T. Martins (5362031)
Dipanjan Das (5362043)
Noah A. Smith (663492)
Publication venue
Publication date: 29/06/2018
Field of study

<p>We present a novel technique for jointly predicting semantic arguments for lexical predicates. The task is to find the best matching between semantic roles and sentential spans, subject to structural constraints that come from expert linguistic knowledge (e.g., in the FrameNet lexicon). We formulate this task as an integer linear program (ILP); instead of using an off-the-shelf tool to solve the ILP, we employ a dual decomposition algorithm, which we adapt for exact decoding via a branch-and-bound technique. Compared to a baseline that makes local predictions, we achieve better argument identification scores and avoid all structural violations. Runtime is nine times faster than a proprietary ILP solver.</p

FigShare