117 research outputs found

    A pipeline for the creation of multimodal corpora from YouTube videos

    Get PDF
    This paper introduces an open-source pipeline for the creation of multimodal corpora from YouTube videos. It minimizes storage and bandwidth requirements, because the videos themselves need not be downloaded and can remain on YouTube’s servers. It also minimizes processing requirements by using YouTube’s automatically generated subtitles, thus avoiding a computationally expensive automatic speech recognition processing step. The pipeline combines standard tools and provides as its output a corpus file in the industry-standard vertical format used by many corpus managers. It is straightforwardly extensible with the addition of further levels of annotation and can be adapted to languages other than English

    Evaluation of Efficiency-Enhancing Measures Using Optimization Algorithms for Fuel Cell Vehicles

    Get PDF
    Efficiency-enhancing measures are evaluated for a serial hybrid fuel cell vehicle over a drive cycle. The regarded powertrain consists of fuel cell system, battery, DC-DC converter, inverter and electrical machine. Within the fuel cell system, the air supply is the largest parasitic load. For the lowest dissipation, different air compression architectures are optimized by a scaling algorithm and compared. Phase switching reduces DC-DC losses. Additionally, a variable DC-link voltage increases efficiency of electrical machine and inverter. Dynamic Programming (DP) is used to evaluate these measures. The DP was extended by start-up and shutdown energy of the fuel cell system to model realistic cycle consumptions. Finally, all these efficiency enhancing measures lead to a reduction of energy consumption by 6.4 % for the serial hybrid fuel cell vehicle over a drive cycle

    The development and implementation of a coding scheme to analyse interview dynamics in the British Household Panel Survey

    Full text link
    The study of interviewer-respondent interaction that occurs during an interview can give very useful insights into the cognitive process of answering questions, the social dynamics that develop in an interview context and the way these dynamics ultimately impact data quality. Behaviour coding is a technique used to code such interactions. Despite its long-standing use, little is written about the procedures to be followed while developing a coding scheme. This paper provides a practical background on the development and implementation of the behaviour coding scheme adopted to explore interview dynamics in the framework of dependent interviewing. This schema was used to code approximately 150 previously transcribed interviews of the British Household Panel Study Wave 16 pilot. Coding strategies and procedures, coder recruitment and training reliability assessments as well as timetable and costs are documented and discussed

    World futures through RT’s eyes: multimodal dataset and interdisciplinary methodology

    Get PDF
    There is a need to develop new interdisciplinary approaches suitable for a more complete analysis of multimodal data. Such approaches need to go beyond case studies and leverage technology to allow for statistically valid analysis of the data. Our study addresses this need by engaging with the research question of how humans communicate about the future for persuasive and manipulative purposes, and how they do this multimodally. It introduces a new methodology for computer-assisted multimodal analysis of video data. The study also introduces the resulting dataset, featuring annotations for speech (textual and acoustic modalities) and gesticulation and corporal behaviour (visual modality). To analyse and annotate the data and develop the methodology, the study engages with 23 26-min episodes of the show ‘SophieCo Visionaries’, broadcast by RT (formerly ‘Russia Today’)

    World futures through RT’s eyes: multimodal dataset and interdisciplinary methodology

    Get PDF
    There is a need to develop new interdisciplinary approaches suitable for a more complete analysis of multimodal data. Such approaches need to go beyond case studies and leverage technology to allow for statistically valid analysis of the data. Our study addresses this need by engaging with the research question of how humans communicate about the future for persuasive and manipulative purposes, and how they do this multimodally. It introduces a new methodology for computer-assisted multimodal analysis of video data. The study also introduces the resulting dataset, featuring annotations for speech (textual and acoustic modalities) and gesticulation and corporal behaviour (visual modality). To analyse and annotate the data and develop the methodology, the study engages with 23 26-min episodes of the show ‘SophieCo Visionaries’, broadcast by RT (formerly ‘Russia Today’)

    Gesture retrieval and its application to the study of multimodal communication

    Full text link
    Comprehending communication is dependent on analyzing the different modalities of conversation, including audio, visual, and others. This is a natural process for humans, but in digital libraries, where preservation and dissemination of digital information are crucial, it is a complex task. A rich conversational model, encompassing all modalities and their co-occurrences, is required to effectively analyze and interact with digital information. Currently, the analysis of co-speech gestures in videos is done through manual annotation by linguistic experts based on textual searches. However, this approach is limited and does not fully utilize the visual modality of gestures. This paper proposes a visual gesture retrieval method using a deep learning architecture to extend current research in this area. The method is based on body keypoints and uses an attention mechanism to focus on specific groups. Experiments were conducted on a subset of the NewsScape dataset, which presents challenges such as multiple people, camera perspective changes, and occlusions. A user study was conducted to assess the usability of the results, establishing a baseline for future gesture retrieval methods in real-world video collections. The results of the experiment demonstrate the high potential of the proposed method in multimodal communication research and highlight the significance of visual gesture retrieval in enhancing interaction with video content. The integration of visual similarity search for gestures in the open-source multimedia retrieval stack, vitrivr, can greatly contribute to the field of computational linguistics. This research advances the understanding of the role of the visual modality in co-speech gestures and highlights the need for further development in this area

    The Role of Email Communications in Determining Response Rates and Mode of Participation in a Mixed-mode Design

    Get PDF
    This article is concerned with the extent to which the propensity to participate in a web-face-to-face sequential mixed-mode survey is influenced by the ability to communicate with sample members by email in addition to mail. Researchers may be able to collect email addresses for sample members and to use them subsequently to send survey invitations and reminders. However, there is little evidence regarding the value of doing so. This makes it difficult to decide what efforts should be made to collect such information and how to subsequently use it efficiently. Using evidence from a randomized experiment within a large mixed-mode national survey, we find that using a respondent-supplied email address to send additional survey invites and reminders does not affect survey response rate but is associated with an increased proportion of responses by web rather than face to face and, hence, lower survey costs

    Co-Speech Gesture Detection through Multi-phase Sequence Labeling

    Full text link
    Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and retraction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework's capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis

    Studying time conceptualisation via speech, prosody, and hand gesture: interweaving manual and computational methods of analysis

    Get PDF
    This paper presents a new interdisciplinary methodology for the analysis of future conceptualisations in big messy media data. More specifically, it focuses on the depictions of post-Covid futures by RT during the pandemic, i.e. on data which are of interest not just from the perspective of academic research but also of policy engagement. The methodology has been developed to support the scaling up of fine-grained data-driven analysis of discourse utterances larger than individual lexical units which are centred around ‘will’ + the infinitive. It relies on the true integration of manual analytical and computational methods and tools in researching three modalities – textual, prosodic1, and gestural. The paper describes the process of building a computational infrastructure for the collection and processing of video data, which aims to empower the manual analysis. It also shows how manual analysis can motivate the development of computational tools. The paper presents individual computational tools to demonstrate how the combination of human and machine approaches to analysis can reveal new manifestations of cohesion between gesture and prosody. To illustrate the latter, the paper shows how the boundaries of prosodic units can work to help determine the boundaries of gestural units for future conceptualisations

    Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods

    Get PDF
    In the field of neurobiology of language, neuroimaging studies are generally based on stimulation paradigms consisting of at least two different conditions. Designing those paradigms can be very time-consuming and this traditional approach is necessarily data-limited. In contrast, in computational and corpus linguistics, analyses are often based on large text corpora, which allow a vast variety of hypotheses to be tested by repeatedly re-evaluating the data set. Furthermore, text corpora also allow exploratory data analysis in order to generate new hypotheses. By drawing on the advantages of both fields, neuroimaging and computational corpus linguistics, we here present a unified approach combining continuous natural speech and MEG to generate a corpus of speech-evoked neuronal activity
    • …
    corecore