1,537 research outputs found
Recommended from our members
Correlating Visual Speaker Gestures with Measures of Audience Engagement to Aid Video Browsing
In this thesis, we argue that in the domains of educational lectures and political debates, speaker gestures can be a source of semantic cues for video browsing. We hypothesize that certain human gestures, which can be automatically identified through techniques of computer vision, can convey significant information that are correlated to audience engagement. We present a joint-angle descriptor derived from an automatic upper body pose estimation framework to train an SVM which identifies point and spread poses in extracted video frames of an instructor giving a lecture. Ground-truth is collected in the form of 2500 manually annotated frames covering 20 minutes of a video lecture. Cross validation on the ground-truth data showed classifier F-scores of 0.54 and 0.39 for point and spread poses, respectively. We also derive an attribute for gestures which measures the angular variance of the arm movements from this system (analogous to arm waving). We present a method for tracking hands which succeeds even when left and right hands are clasping and occluding each other. We evaluate on a ground-truth dataset of 698 images with 1301 annotated left and right hands, mostly clasped. Our method performs better than baseline on recall (0.66 vs. 0.53) without sacrificing precision (0.65 for both) toward the goal of recognizing clasped hands. For tracking, it results in an improvement over a baseline method with an F-score of 0.59 vs. 0.48. From this, we are able to derive hand motion-based gesture attributes such as velocity, direction change and extremal pose. In ground-truth studies, we manually annotate and analyze the gestures of two instructors, each in a 75-minute computer science lecture using a 14-bit pose vector. We observe "pedagogical" gestures of punctuation and encouragement in addition to traditional classes of gestures such as deictic and metaphoric. We also introduce a tool to facilitate the manual annotations of gestures in video and present results on their frequencies and co-occurrences. In particular, we find that 5 poses represent 80% of the variation in the annotated ground truth. We demonstrate a correlation between the angular variance of arm movements and the presence of those conjunctions that are used to contrast connected clauses ("but", "neither", etc.) in the accompanying speech. We do this by training an AdaBoost-based binary classifier using decision trees as weak learners. On a ground-truth database of 4243 video clips totaling 3.83 hours, each with subtitles, training on sets of conjunctions indicating contrast produces classifiers capable of achieving 55% accuracy on a balanced test set. We study two different presentation methods: an attribute graph which shows a normalized measure of the visual attributes across an entire video, as well as emphasized subtitles, where individual words are emphasized (resized) based on their accompanying gestures. Results from 12 subjects show supportive ratings given for the browsing aids in the task of providing keywords for video under time constraints. Subjects' keywords are also compared to independent ground-truth, resulting in precisions from 0.50-0.55, even when given less than half real time to view the video. We demonstrate a correlation between gesture attributes and a rigorous method of measuring audience engagement: electroencephalography (EEG). Our 20 subjects watch 61 minutes of video of the 2012 U.S. Presidential Debates while under observation through EEG. After discarding corrupted recordings, we retain 47 minutes worth of EEG data for each subject. The subjects are examined in aggregate and in subgroups according to gender and political affiliation. We find statistically significant correlations between gesture attributes (particularly extremal pose) and our feature of engagement derived from EEG. For all subjects watching all videos, we see a statistically significant correlation between gesture and engagement with a Spearman rank correlation of rho = 0.098 with p < 0.05, Bonferroni corrected. For some stratifications, correlations reach as high as rho = 0.297. From these results, we conclude what gestures can be used to measure engagement
Multimodal engagement strategies in science dissemination: A case study of TED talks and YouTube science videos
The growing interest on science dissemination offers new opportunities to communicate science openly to various audiences, but also brings on the challenge of adapting to an audience that does not share the same academic background. This adaptation has been referred to as recontextualization. In the case of the formats that concern this study, that is, TEDx Talks and YouTube science dissemination videos, their multimodal nature suggests that recontextualization, and therefore engagement as a crucial aspect of this process, is likely to go way beyond purely linguistic aspects. The aim of this study is to unveil how engagement strategies in two science dissemination formats (a face to face talk and an online video) are realized through complex multimodal ensembles, and to highlight differences across them. In order to fulfill this aim, two talks by the same presenter and dealing with similar content were selected for analysis: a TEDx talk and a YouTube science dissemination video from the channel PBS Space Time. The recordings were annotated using the software Multimodal Video Analysis. The annotation included engagement strategies; embodied modes, that is, modes carried out using the body; and, in the case of the YouTube video, filmic modes, that is, modes triggered by the editing process of the recorded video. Our results show that the role of both embodied and filmic modes is paramount in the realization of engagement strategies. Our findings also bring to the fore significant differences in the ways in which the two distinct audiences are engaged, concerning the frequency and use of both semiotic modes and engagement strategies
A Review on Recent Advances in Video-based Learning Research: Video Features, Interaction, Tools, and Technologies
Human learning shifts stronger than ever towards online settings, and especially towards video platforms. There is an abundance of tutorials and lectures covering diverse topics, from fixing a bike to particle physics. While it is advantageous that learning resources are freely available on the Web, the quality of the resources varies a lot. Given the number of available videos, users need algorithmic support in finding helpful and entertaining learning resources.
In this paper, we present a review of the recent research literature (2020-2021) on video-based learning. We focus on publications that examine the characteristics of video content, analyze frequently used features and technologies, and, finally, derive conclusions on trends and possible future research directions
A Multimodal Approach to Metadiscourse as an Organizational Tool in Lectures
This thesis explores the uses of organizational metadiscourse in lectures from a multimodal perspective, thus providing a holistic view of its use. Moreover, this study explores how the use of organizational metadiscourse, both at a linguistic and at non-verbal level, is influenced by the lecturing style chosen by the lecturers (conversational, rhetorical or reading styles).Esta tesis explora los usos del metadiscurso organizativo en clases universitarias desde una perspectiva multimodal que permite obtener una visión holística del mismo. Además, se describe como el discurso organizativo es influenciado tanto a nivel lingüístico como a nivel no verbal por el estilo de enseñanza del profesorado (conversacional, retórico o lector).Programa de Doctorat en Llengües Aplicades, Literatura i Traducci
Conceitos e métodos para apoio ao desenvolvimento e avaliação de colaboração remota utilizando realidade aumentada
Remote Collaboration using Augmented Reality (AR) shows great
potential to establish a common ground in physically distributed
scenarios where team-members need to achieve a shared goal.
However, most research efforts in this field have been devoted to
experiment with the enabling technology and propose methods to
support its development. As the field evolves, evaluation and
characterization of the collaborative process become an essential,
but difficult endeavor, to better understand the contributions of AR.
In this thesis, we conducted a critical analysis to identify the main
limitations and opportunities of the field, while situating its maturity
and proposing a roadmap of important research actions. Next, a
human-centered design methodology was adopted, involving
industrial partners to probe how AR could support their needs
during remote maintenance. These outcomes were combined with
literature methods into an AR-prototype and its evaluation was
performed through a user study. From this, it became clear the
necessity to perform a deep reflection in order to better understand
the dimensions that influence and must/should be considered in
Collaborative AR. Hence, a conceptual model and a humancentered
taxonomy were proposed to foster systematization of
perspectives. Based on the model proposed, an evaluation
framework for contextualized data gathering and analysis was
developed, allowing support the design and performance of
distributed evaluations in a more informed and complete manner.
To instantiate this vision, the CAPTURE toolkit was created,
providing an additional perspective based on selected dimensions
of collaboration and pre-defined measurements to obtain “in situ”
data about them, which can be analyzed using an integrated
visualization dashboard. The toolkit successfully supported
evaluations of several team-members during tasks of remote
maintenance mediated by AR. Thus, showing its versatility and
potential in eliciting a comprehensive characterization of the added
value of AR in real-life situations, establishing itself as a generalpurpose
solution, potentially applicable to a wider range of
collaborative scenarios.Colaboração Remota utilizando Realidade Aumentada (RA)
apresenta um enorme potencial para estabelecer um entendimento
comum em cenários onde membros de uma equipa fisicamente
distribuídos precisam de atingir um objetivo comum. No entanto, a
maioria dos esforços de investigação tem-se focado nos aspetos
tecnológicos, em fazer experiências e propor métodos para apoiar
seu desenvolvimento. À medida que a área evolui, a avaliação e
caracterização do processo colaborativo tornam-se um esforço
essencial, mas difícil, para compreender as contribuições da RA.
Nesta dissertação, realizámos uma análise crítica para identificar
as principais limitações e oportunidades da área, ao mesmo tempo
em que situámos a sua maturidade e propomos um mapa com
direções de investigação importantes. De seguida, foi adotada uma
metodologia de Design Centrado no Humano, envolvendo
parceiros industriais de forma a compreender como a RA poderia
responder às suas necessidades em manutenção remota. Estes
resultados foram combinados com métodos da literatura num
protótipo de RA e a sua avaliação foi realizada com um caso de
estudo. Ficou então clara a necessidade de realizar uma reflexão
profunda para melhor compreender as dimensões que influenciam
e devem ser consideradas na RA Colaborativa. Foram então
propostos um modelo conceptual e uma taxonomia centrada no ser
humano para promover a sistematização de perspetivas. Com base
no modelo proposto, foi desenvolvido um framework de avaliação
para recolha e análise de dados contextualizados, permitindo
apoiar o desenho e a realização de avaliações distribuídas de
forma mais informada e completa. Para instanciar esta visão, o
CAPTURE toolkit foi criado, fornecendo uma perspetiva adicional
com base em dimensões de colaboração e medidas predefinidas
para obter dados in situ, que podem ser analisados utilizando o
painel de visualização integrado. O toolkit permitiu avaliar com
sucesso vários colaboradores durante a realização de tarefas de
manutenção remota apoiada por RA, permitindo mostrar a sua
versatilidade e potencial em obter uma caracterização abrangente
do valor acrescentado da RA em situações da vida real. Sendo
assim, estabelece-se como uma solução genérica, potencialmente
aplicável a uma gama diversificada de cenários colaborativos.Programa Doutoral em Engenharia Informátic
Metadiscourse analysis of digital interpersonal interactions in academic settings in Turkey
Rapid technological advances, efficiency and easy access have firmly established emailing as a vital medium of communication in the last decades. Nowadays, all around the world, particularly in educational settings, the medium is one of the most widely used modes of interaction between students and university lecturers. Despite their important role in academic life, very little is known about the metadiscursive characteristics of these e-messages and as far as the author is aware there is no study that has examined metadiscourse in request emails in Turkish. This study aims to contribute to filling in this gap by focusing on the following two research questions: (i) How many and what type of interpersonal metadiscourse markers are used in request emails sent by students to their lecturers? (ii) Where are they placed and how are they combined with other elements in the text? In order to answer these questions a corpus of unsolicited request e-mails in Turkish was compiled. The data collection started in January 2010 and continued until March 2018. A total of 353 request emails sent from university students to their lecturers were collected. The data were first transcribed in CLAN CHILDES format and analysed using the interpersonal model. The metadiscourse categories that aimed to involve readers in the email were identified and classified. Next, their places in the text were determined and described in detail. Findings of the study show that request emails include a wide array of multifunctional interpersonal metadiscourse markers which are intricately combined and employed by the writers to reach their aims. The results also showed that there is a close relation between the “weight of the request” and number of the interpersonal metadiscourse markers in request mails
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Switching Partners: Dancing with the Ontological Engineers
Ontologies are today being applied in almost every field to support the alignment and retrieval of data of distributed provenance. Here we focus on new ontological work on dance and on related cultural phenomena belonging to what UNESCO calls the “intangible heritage.” Currently data and information about dance, including video data, are stored in an uncontrolled variety of ad hoc ways. This serves not only to prevent retrieval, comparison and analysis of the data, but may also impinge on our ability to preserve the data that already exists. Here we explore recent technological developments that are designed to counteract such problems by allowing information to be retrieved across disciplinary, cultural, linguistic and technological boundaries. Software applications such as the ones envisaged here will enable speedier recovery of data and facilitate its analysis in ways that will assist both archiving of and research on dance
- …