Search CORE

6,675 research outputs found

Recommended from our members

Generation of multi-modal dialogue for a net environment

Author: Baumann S.
Grice M.
Gstrein E.
Klesen M.
Krenn B.
Pirker H.
Piwek P.
Schroeder M.
van Deemter K.
Publication venue
Publication date: 01/01/2002
Field of study

In this paper an architecture and special purpose markup language for simulated affective face-to-face communication is presented. In systems based on this architecture, users will be able to watch embodied conversational agents interact with each other in virtual locations on the internet. The markup language, or Rich Representation Language (RRL), has been designed to provide an integrated representation of speech, gesture, posture and facial animation

Open Research Online (The Open University)

From Linguistic Linked Open Data to Multimodal Natural Interaction: A Case Study

Author: Cera Valeria
Cutugno Francesco
Di Maro Maria
Grazioso Marco
Origlia Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Towards responsive Sensitive Artificial Listeners

Author: Cowie Roddy
Heylen Dirk
Pantic Maja
Pelachaud Catherine
Schröder Marc
Schuller Björn
Publication venue: University of Sheffield
Publication date: 01/01/2008
Field of study

This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

CiteSeerX

University of Twente Research Information

The CorDis Corpus Mark-up and Related Issues

Author: A. MARCHI
CIRILLO LETIZIA
VENUTI MARCO
Publication venue
Publication date: 01/01/2007
Field of study

CorDis is a large, XML, TEI-conformant, POS-tagged, multimodal, multigenre corpus representing a significant portion of the political and media discourse on the 2003 Iraqi conflict. It was generated from different sub-corpora which had been assembled by various research groups, ranging from official transcripts of Parliamentary sessions, both in the US and the UK, to the transcripts of the Hutton Inquiry, from American and British newspaper coverage of the conflict to White House press briefings and to transcriptions of American and British TV news programmes. The heterogeneity of the data, the specificity of the genres and the diverse discourse analytical purposes of different groups had led to a wide range of coding strategies being employed to make textual and meta-textual information retrievable. The main purpose of this paper is to show the process of harmonisation and integration whereby a loose collection of texts has become a stable architecture. The TEI proved a valid instrument to achieve standardisation of mark-up. The guidelines provide for a hierarchical organisation which gives the corpus a sound structure favouring replicability and enhancing the reliability of research. In discussing some examples of the problems encountered in the annotation, we will deal with issues like consistency and re-usability, and will examine the constraints imposed on data handling by specific research objectives. Examples include the choice to code the same speakers in different ways depending on the various (institutional) roles they may assume throughout the corpus, the distinction between quotations of spoken or written discourse and quotations read aloud in the course of a spoken text, and the segmentation of portions of news according to participants interaction and use of camera/voiceover

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio della Ricerca - Università degli Studi di Siena

Combining Language and Vision with a Multimodal Skip-gram Model

Author: Baroni Marco
Lazaridou Angeliki
Pham Nghia The
Publication venue
Publication date: 01/01/2015
Field of study

We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual information into account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM) build vector-based word representations by learning to predict linguistic contexts in text corpora. However, for a restricted set of words, the models are also exposed to visual representations of the objects they denote (extracted from natural images), and must predict linguistic and visual features jointly. The MMSKIP-GRAM models achieve good performance on a variety of semantic benchmarks. Moreover, since they propagate visual information to all words, we use them to improve image labeling and retrieval in the zero-shot setup, where the test concepts are never seen during model training. Finally, the MMSKIP-GRAM models discover intriguing visual properties of abstract words, paving the way to realistic implementations of embodied theories of meaning.Comment: accepted at NAACL 2015, camera ready version, 11 page

arXiv.org e-Print Archive

Crossref

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Author: Bouritsas Giorgos
Koutras Petros
Maragos Petros
Zlatintsi Athanasia
Publication venue
Publication date: 04/04/2018
Field of study

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref