64,403 research outputs found

    A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals

    Get PDF
    Language resources for studying doctor–patient interaction are rare, primarily due to the ethical issues related to recording real medical consultations. Rarer still are resources that involve more than one healthcare professional in consultation with a patient, despite many chronic conditions requiring multiple areas of expertise for effective treatment. In this paper, we present the design, construction and output of the Patient Consultation Corpus, a multimodal corpus of simulated consultations between a patient portrayed by an actor, and at least two healthcare professionals with different areas of expertise. As well as the transcribed text from each consultation, the corpus also contains audio and video where for each consultation: the audio consists of individual tracks for each participant, allowing for clear identification of speakers; the video consists of two framings for each participant—upper-body and face—allowing for close analysis of behaviours and gestures. Having presented the design and construction of the corpus, we then go on to briefly describe how the multi-modal nature of the corpus allows it to be analysed from several different perspectives

    Building a sign language corpus for use in machine translation

    Get PDF
    In recent years data-driven methods of machine translation (MT) have overtaken rule-based approaches as the predominant means of automatically translating between languages. A pre-requisite for such an approach is a parallel corpus of the source and target languages. Technological developments in sign language (SL) capturing, analysis and processing tools now mean that SL corpora are becoming increasingly available. With transcription and language analysis tools being mainly designed and used for linguistic purposes, we describe the process of creating a multimedia parallel corpus specifically for the purposes of English to Irish Sign Language (ISL) MT. As part of our larger project on localisation, our research is focussed on developing assistive technology for patients with limited English in the domain of healthcare. Focussing on the first point of contact a patient has with a GP’s office, the medical secretary, we sought to develop a corpus from the dialogue between the two parties when scheduling an appointment. Throughout the development process we have created one parallel corpus in six different modalities from this initial dialogue. In this paper we discuss the multi-stage process of the development of this parallel corpus as individual and interdependent entities, both for our own MT purposes and their usefulness in the wider MT and SL research domains

    On the perspectivization of a recipient role - cross-linguistic results from a speech production experiment on GET-passives in German, Dutch and Luxembourgish

    Get PDF
    The focus of this paper is the perspectivization of thematic roles generally and the recipient role specifically. Whereas perspective is defined here as the representation of something for someone from a given position (Sandig 1996: 37), perspectivization refers to the verbalization of a situation in the speech generation process (Storrer 1996: 233). In a prototypical act of giving, for example, the focus of perception (the attention of the external observer) may be on the person who gives (agent), the transferred object (patient) or the person who receives the transferred object (recipient). The languages of the world provide differing linguistic means to perspectivize such an act of giving, or better: to perspectivize the participants of such an action. In this article, the linguistic means of three selected continental West Germanic languages –German, Dutch and Luxembourgish– will be taken into consideration, with an emphasis on the perspectivization of the recipient role

    Saying What You're Looking For: Linguistics Meets Video Search

    Full text link
    We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies.Comment: 13 pages, 8 figure

    FML: Face Model Learning from Videos

    Full text link
    Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. Our face model is learned using only corpora of in-the-wild video clips collected from the Internet. This virtually endless source of training data enables learning of a highly general 3D face model. In order to achieve this, we propose a novel multi-frame consistency loss that ensures consistent shape and appearance across multiple frames of a subject's face, thus minimizing depth ambiguity. At test time we can use an arbitrary number of frames, so that we can perform both monocular as well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ, Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19

    Video Data Visualization System: Semantic Classification And Personalization

    Full text link
    We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the edges are the relation between documents and the classes of documents. Finally, we construct the user's profile, based on the interaction with the system, to render the system more adequate to its references.Comment: graphic

    A new multi-modal dataset for human affect analysis

    Get PDF
    In this paper we present a new multi-modal dataset of spontaneous three way human interactions. Participants were recorded in an unconstrained environment at various locations during a sequence of debates in a video conference, Skype style arrangement. An additional depth modality was introduced, which permitted the capture of 3D information in addition to the video and audio signals. The dataset consists of 16 participants and is subdivided into 6 unique sections. The dataset was manually annotated on a continuously scale across 5 different affective dimensions including arousal, valence, agreement, content and interest. The annotation was performed by three human annotators with the ensemble average calculated for use in the dataset. The corpus enables the analysis of human affect during conversations in a real life scenario. We first briefly reviewed the existing affect dataset and the methodologies related to affect dataset construction, then we detailed how our unique dataset was constructed

    A usage based approach into the acquisition of relative clauses

    Get PDF
    ABSTRACT: Previous research has shown that cross-linguistically relative clauses are acquired late and are considered as a signal of linguistic complexity. This study adapts a usage-based account of relative clause acquisition in Turkish. A corpus based on three databases including 170 recordings of naturalistic mother-child interaction was analysed. The age of children in these three databases are 02;00-03;06, 01;00-02;04 and 00;09-02;09, respectively. The analyses revealed that the use of relative clauses in both the children’s productions and in child-directed speech were extremely scarce. Though previous research underlined the linguistic complexity of relative clauses as a reason for late acquisition, the results of this study point out that scarcity of input should also be regarded as a powerful predictor. The study underlines the availability of other constructions that are functionally parallel to relative clauses. The findings suggest that such structures which are syntactically and morphologically less complex than relative clauses are common in both child directed speech and in children’s productions
    • 

    corecore