64,403 research outputs found
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals
Language resources for studying doctorâpatient interaction are rare, primarily due to the ethical issues related to recording real medical consultations. Rarer still are resources that involve more than one healthcare professional in consultation with a patient, despite many chronic conditions requiring multiple areas of expertise for effective treatment. In this paper, we present the design, construction and output of the Patient Consultation Corpus, a multimodal corpus of simulated consultations between a patient portrayed by an actor, and at least two healthcare professionals with different areas of expertise. As well as the transcribed text from each consultation, the corpus also contains audio and video where for each consultation: the audio consists of individual tracks for each participant, allowing for clear identification of speakers; the video consists of two framings for each participantâupper-body and faceâallowing for close analysis of behaviours and gestures. Having presented the design and construction of the corpus, we then go on to briefly describe how the multi-modal nature of the corpus allows it to be analysed from several different perspectives
Building a sign language corpus for use in machine translation
In recent years data-driven methods of machine translation (MT) have overtaken rule-based approaches as the predominant means of automatically translating between languages. A pre-requisite for such an approach is a parallel corpus of the source and target languages. Technological developments in sign language (SL) capturing, analysis and processing tools now mean that SL corpora are
becoming increasingly available. With transcription and language analysis tools being mainly designed and used for linguistic purposes, we describe the process of creating a multimedia parallel corpus specifically for the purposes of English to Irish Sign Language (ISL) MT. As part of our larger project on localisation, our research is focussed on developing assistive technology for patients with limited English in the domain of healthcare. Focussing on the first point of contact a patient has with a GPâs office, the
medical secretary, we sought to develop a corpus from the dialogue between the two parties when scheduling an appointment. Throughout the development process we have created one parallel corpus in six different modalities from this initial dialogue. In this paper we discuss the multi-stage process of the development of this parallel corpus as individual and interdependent entities, both for
our own MT purposes and their usefulness in the wider MT and SL research domains
On the perspectivization of a recipient role - cross-linguistic results from a speech production experiment on GET-passives in German, Dutch and Luxembourgish
The focus of this paper is the perspectivization of thematic roles generally and the recipient role specifically. Whereas perspective is defined here as the representation of something for someone from a given position (Sandig 1996: 37), perspectivization refers to the verbalization of a situation in the speech generation process (Storrer 1996: 233). In a prototypical act of giving, for example, the focus of perception (the attention of the external observer) may be on the person who gives (agent), the transferred object (patient) or the person who receives the transferred object (recipient). The languages of the world provide differing linguistic means to perspectivize such an act of giving, or better: to perspectivize the participants of such an action. In this article, the linguistic means of three selected continental West Germanic languages âGerman, Dutch and Luxembourgishâ will be taken into consideration, with an emphasis on the perspectivization of the recipient role
Saying What You're Looking For: Linguistics Meets Video Search
We present an approach to searching large video corpora for video clips which
depict a natural-language query in the form of a sentence. This approach uses
compositional semantics to encode subtle meaning that is lost in other systems,
such as the difference between two sentences which have identical words but
entirely different meaning: "The person rode the horse} vs. \emph{The horse
rode the person". Given a video-sentence pair and a natural-language parser,
along with a grammar that describes the space of sentential queries, we produce
a score which indicates how well the video depicts the sentence. We produce
such a score for each video clip in a corpus and return a ranked list of clips.
Furthermore, this approach addresses two fundamental problems simultaneously:
detecting and tracking objects, and recognizing whether those tracks depict the
query. Because both tracking and object detection are unreliable, this uses
knowledge about the intended sentential query to focus the tracker on the
relevant participants and ensures that the resulting tracks are described by
the sentential query. While earlier work was limited to single-word queries
which correspond to either verbs or nouns, we show how one can search for
complex queries which contain multiple phrases, such as prepositional phrases,
and modifiers, such as adverbs. We demonstrate this approach by searching for
141 queries involving people and horses interacting with each other in 10
full-length Hollywood movies.Comment: 13 pages, 8 figure
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
Video Data Visualization System: Semantic Classification And Personalization
We present in this paper an intelligent video data visualization tool, based
on semantic classification, for retrieving and exploring a large scale corpus
of videos. Our work is based on semantic classification resulting from semantic
analysis of video. The obtained classes will be projected in the visualization
space. The graph is represented by nodes and edges, the nodes are the keyframes
of video documents and the edges are the relation between documents and the
classes of documents. Finally, we construct the user's profile, based on the
interaction with the system, to render the system more adequate to its
references.Comment: graphic
A new multi-modal dataset for human affect analysis
In this paper we present a new multi-modal dataset of spontaneous three way human interactions. Participants were recorded in an unconstrained environment at various locations during a sequence of debates in a video conference, Skype style arrangement. An additional depth modality was introduced, which permitted the capture of 3D information in addition to the video and audio signals. The dataset consists of 16 participants and is subdivided into 6 unique sections. The dataset was manually annotated on a continuously scale across 5 different affective dimensions including arousal, valence, agreement, content and interest.
The annotation was performed by three human annotators with the ensemble average calculated for use in the dataset. The corpus enables the analysis of human affect during conversations in a real life scenario. We first briefly reviewed the existing affect dataset and the methodologies
related to affect dataset construction, then we detailed how our unique dataset was constructed
A usage based approach into the acquisition of relative clauses
ABSTRACT: Previous research has shown that cross-linguistically relative clauses are acquired late and are considered as a signal of linguistic complexity. This study adapts a usage-based account of relative clause acquisition in Turkish. A corpus based on three databases including 170 recordings of naturalistic mother-child interaction was analysed. The age of children in these three databases are 02;00-03;06, 01;00-02;04 and 00;09-02;09, respectively. The analyses revealed that the use of relative clauses in both the childrenâs productions and in child-directed speech were extremely scarce. Though previous research underlined the linguistic complexity of
relative clauses as a reason for late acquisition, the results of this study point out that scarcity of input should also be regarded as a powerful predictor. The study underlines the availability of other constructions that are functionally parallel to relative clauses. The findings suggest that such structures which are syntactically and morphologically less complex than relative clauses are common in both child directed speech and in childrenâs productions
- âŠ