Search CORE

189 research outputs found

Recommended from our members

MUSCLE movie-database: a multimodal corpus with rich annotation for dialogue and saliency detection

Author: Antonopoulos P.
Benetos E.
Kotropoulos C.
Kotti M.
Maragos P.
Moschou V.
Nikolaidis N.
Pitas I.
Spachos D.
Tzimouli K.
Zlantintsi A.
Publication venue
Publication date: 01/01/2008
Field of study

City Research Online

Spiral - Imperial College Digital Repository

The CAMOMILE collaborative annotation platform for multi-modal, multi-lingual and multi-media documents

Author: Adda Gilles
Barras Claude
Bredin Herve
Budnik Mateusz
Hernando Pericás Francisco Javier
Mariani Joseph
Morros Rubió Josep Ramon
Poignant Johann
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment

Author: A. Hanjalic
A.F. Smeaton
J. Kekäläinen
J. Kürsten
J.J.M. Kierkels
J.M. Perea-Ortega
M. Larson
M. Larson
P. Pecina
S. Raaijmakers
T.-A. Dobrilǎ
Á. Gyarmati
Publication venue
Publication date: 01/01/2009
Field of study

VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the “Beeldenstorm” collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called “Finding Related Resources Across Languages,” involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm” collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Une expérience d'annotation à large échelle : le projet OTIM

Author: Bertrand Roxane
Bigi Brigitte
Blache Philippe
Espesser Robert
Guardiola Mathilde
Rauzy Stéphane
Publication venue: HAL CCSD
Publication date: 01/04/2011
Field of study

Nous proposons dans cette présentation de faire le point sur une opération dannotation de grande envergure conduite dans le cadre du projet OTIM. Nous avons dans le cadre de ce projet constitué un grand corpus audio-visuel de parole spontanée comprenant 8 heures de dialogues (soit 102.457 mots correspondant à 6.611 formes différentes) totalement transcrit, aligné et richement annoté pour lensemble des domaines et des modalités. Nous avons donc été confrontés aux principaux problèmes posés par lannotation de ce type de ressource. Cette présentation décrit les recommandations et les techniques que nous avons utilisées pour parvenir à nos fins

HAL AMU

An exchange format for multimodal annotations

Author: Duncan S.
Ehmer O.
Hoyt J.
Kipp M.
Loehr D.
Magnusson M.
Rose T.
Schmidt T.
Sloetjes H.
Publication venue
Publication date: 01/01/2008
Field of study

This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tool

MPG.PuRe

Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French

Author: Ferré Gaëlle
Publication venue: HAL CCSD
Publication date: 01/05/2010
Field of study

International audienceSeveral studies have described the links between gesture and speech in terms of timing, most of them concentrating on the production of hand gestures during speech or during pauses (Beattie & Aboudan, 1994; Nobe, 2000). Other studies have focused on the anticipation, synchronization or delay of gestures regarding their co-occurrence with speech (Schegloff, 1984; McNeill, 1992, 2005; Kipp, 2003; Loehr, 2004; Chui, 2005; Kida & Faraco, 2008; Leonard and Cummins, 2009) and we would like to participate in the debate in the present paper. We studied the timing relationships between iconic gestures and their lexical affiliates (Kipp, Neff et al., 2001) in a corpus of French conversational speech involving 6 speakers and annotated both in Praat (Boersma & Weenink, 2009) and Anvil (Kipp, 2001). The timing relationships we observed concerned the position of the gesture stroke as compared to that of the lexical affiliate and the Intonation Phrase, as well as the position of the gesture Phrase as regards that of the Intonation Phrase. The main results show that although gesture and speech are co-occurring, gestures generally start before the related speech segment

Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French

Author: Ferré Gaëlle
Publication venue: HAL CCSD
Publication date: 01/05/2010
Field of study

Automatic annotation of head velocity and acceleration in Anvil

Author: Jongejan Bart
Publication venue: European Language Resources Association
Publication date: 23/05/2012
Field of study

Copenhagen University Research Information System

Multimodal Annotations and Categorization for Political Debates

Author: Bigi Brigitte
Portès Cristel
Steuckardt Agnès
Tellier Marion
Publication venue: HAL CCSD
Publication date: 01/11/2011
Field of study

International audienceThe paper introduces an annotation scheme for a political debate dataset which is mainly in the form of video, and audio annotations. The annotation contains various infor- mation ranging from general linguistic to domain specific information. Some are annotated with automatic tools, and some are manually annotated. One of the goals is to use the information to predict the categories of the answers by the speaker to the disruptions. A typology of such answers is proposed and an automatic categorization system based on a multimodal parametrization is successfully performed

HAL AMU