192 research outputs found
Video interaction using pen-based technology
Dissertação para obtenção do Grau de Doutor em
InformáticaVideo can be considered one of the most complete and complex media and its manipulating
is still a difficult and tedious task. This research applies pen-based technology to
video manipulation, with the goal to improve this interaction. Even though the human
familiarity with pen-based devices, how they can be used on video interaction, in order
to improve it, making it more natural and at the same time fostering the user’s creativity
is an open question.
Two types of interaction with video were considered in this work: video annotation
and video editing. Each interaction type allows the study of one of the interaction modes
of using pen-based technology: indirectly, through digital ink, or directly, trough pen
gestures or pressure. This research contributes with two approaches for pen-based video
interaction: pen-based video annotations and video as ink.
The first uses pen-based annotations combined with motion tracking algorithms, in
order to augment video content with sketches or handwritten notes. It aims to study how
pen-based technology can be used to annotate a moving objects and how to maintain the
association between a pen-based annotations and the annotated moving object
The second concept replaces digital ink by video content, studding how pen gestures
and pressure can be used on video editing and what kind of changes are needed in the
interface, in order to provide a more familiar and creative interaction in this usage context.This work was partially funded by the UTAustin-Portugal, Digital Media, Program
(Ph.D. grant: SFRH/BD/42662/2007 - FCT/MCTES); by the HP Technology for Teaching
Grant Initiative 2006; by the project "TKB - A Transmedia Knowledge Base for contemporary
dance" (PTDC/EAT/AVP/098220/2008 funded by FCT/MCTES); and by CITI/DI/FCT/UNL (PEst-OE/EEI/UI0527/2011
Modeling of Performance Creative Evaluation Driven by Multimodal Affective Data
Performance creative evaluation can be achieved through affective data, and the use of affective featuresto evaluate performance creative is a new research trend. This paper proposes a “Performance Creative—Multimodal Affective (PC-MulAff)” model based on the multimodal affective features for performance creative evaluation. The multimedia data acquisition equipment is used to collect the physiological data of the audience, including the multimodal affective data such as the facial expression, heart rate and eye movement. Calculate affective features of multimodal data combined with director annotation, and defined “Performance Creative—Affective Acceptance (PC-Acc)” based on multimodal affective features to evaluate the quality of performance creative. This paper verifies the PC-MulAff model on different performance data sets. The experimental results show that the PC-MulAff model shows high evaluation quality in different performance forms. In the creative evaluation of dance performance, the accuracy of the model is 7.44% and 13.95% higher than that of the single textual and single video evaluation
People and object tracking for video annotation
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaObject tracking is a thoroughly researched problem, with a body of associated literature
dating at least as far back as the late 1970s. However, and despite the development of some satisfactory real-time trackers, it has not yet seen widespread use. This is not due to a lack of applications for the technology, since several interesting ones exist. In this document, it is postulated that this status quo is due, at least in part, to a lack of easy to use software libraries supporting object tracking. An overview of the problems associated with object tracking is presented and the process of developing one such library is documented. This discussion includes how to overcome problems like heterogeneities in
object representations and requirements for training or initial object position hints.
Video annotation is the process of associating data with a video’s content. Associating data with a video has numerous applications, ranging from making large video archives or long videos searchable, to enabling discussion about and augmentation of the video’s content. Object tracking is presented as a valid approach to both automatic and manual video annotation, and the integration of the developed object tracking library into an existing video annotator, running on a tablet computer, is described. The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our interaction approaches to handle moving object selection on live video, which we have called “Hold and Overlay” and “Hold and Speed Up”. In addition, the results of a set of preliminary tests are reported.project “TKB – A Transmedia Knowledge Base
for contemporary dance” (PTDC/EA /AVP/098220/2008 funded by FCT/MCTES), the
UTAustin – Portugal, Digital Media Program (SFRH/BD/42662/2007 FCT/MCTES) and by CITI/DI/FCT/UNL (Pest-OE/EEI/UI0527/2011
The Role of Eye Gaze and Body Movements in Turn-Taking during a Contemporary Dance Improvisation
Abstract This paper intends to contribute to the multimodal turn-taking literature by presenting data collected in an improvisation session in the context of the performing arts and its qualiquantitative analysis, where the focus is on how gaze and the full body participate in the interaction. Five expert performers joined Portuguese contemporary choreographer, João Fiadeiro, in practicing his Real Time Composition Method during an improvisation session, which was recorded and annotated for this study. A micro-analysis of portions of the session was conducted using ELAN. We found that intersubjectivity was avoided during this performance, both in the performers' bodily movements and mutual gaze; we extrapolate that peripheral vision was chiefly deployed as a regulating strategy by these experts to coordinate turn-taking. A macro-analysis comparing the data with an analogous one obtained from NonPerformers provides the context for a discussion on multimodality and decision-making
FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions
Multimodal analyses have been growing in importance within several approaches to
Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless
fine-grained semantic representations of multimodal objects are still lacking, especially in terms
of integrating areas such as Natural Language Processing and Computer Vision, which are key
for the implementation of multimodality in Computational Linguistics. In this dissertation, we
propose a methodology for extending FrameNet annotation to the multimodal domain, since
FrameNet can provide fine-grained semantic representations, particularly with a database
enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet
Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the
hypothesis that similarly to the way in which words in a sentence evoke frames and organize
their elements in the syntactic locality accompanying them, visual elements in video shots may,
also, evoke frames and organize their elements on the screen or work complementarily with the
frame evocation patterns of the sentences narrated simultaneously to their appearance on screen,
providing different profiling and perspective options for meaning construction. The corpus
annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series
critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre
chosen also configures a novel experimental setting for research on integrated image and text
comprehension, since, in this corpus, text is not a direct description of the image sequence but
correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker
experiment conducted to validate the approach proposed to a text-oriented annotation. The
experiment demonstrated that it is not possible to determine that text impacts gaze directly and
was taken as a reinforcement to the approach of valorizing modes combination. Last, we present
the Frame2 dataset, the product of the annotation task carried out for the corpus following both
the methodology and guidelines proposed. The results achieved demonstrate that, at least for
this TV genre but possibly also for others, a fine-grained semantic annotation tackling the
diverse correlations that take place in a multimodal setting provides new perspective in
multimodal comprehension modeling. Moreover, multimodal annotation also enriches the
development of FrameNets, to the extent that correlations found between modalities can attest
the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da
Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem
Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos
multimodais, especialmente em termos de integração de áreas como Processamento de
Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de
multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma
metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma
vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com
um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o
caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais,
delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e
organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos
planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar
de forma complementar aos padrões de evocação de frames das sentenças narradas
simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de
perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto
por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um
exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também
configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão
textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens,
mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também
relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para
uma anotação orientada por texto. O experimento demonstrou que não é possível determinar
que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para
a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de
dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e
as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de
TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as
diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva
na modelagem da compreensão multimodal. Além disso, a anotação multimodal também
enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre
as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos
baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio
ARTeFACT Movement Thesaurus
The ARTeFACT Movement Thesaurus is a continuation of the ARTeFACT project which was developed at the University of Virginia as a means of enabling research into movement-based arts, specifically dance. The Movement Thesaurus is a major step toward providing access to movement-derived data. By using motion capture technologies we plan to provide a sophisticated, open source tool that can help make film searchable for single movements and movement phrases. The ARTeFACT Movement Thesaurus will contain over 100 codified dance movements derived from Western concert dance genres and styles from which we can develop algorithms for automatic search capabilities in film. By bringing together engineers, movement specialists, and mathematicians we will forge ahead to break new ground in movement research and take one step closer to the creation of an automated means of mining danced texts and filmed movement
Recommended from our members
The body beyond movement: (Missed) opportunities to engage with contemporary dance in HCI
This paper argues that a significant paradigm change in contemporary dance can offer further opportunities for HCI researchers interested in embodied interaction and interactive system design. Based on the analysis of 42 HCI papers in our data set, resulting from searches in two computing research libraries, we suggest seven thematic categories that reflect how HCI researchers have been engaging with contemporary dance. Moreover, we propose a standardized usage of contemporary dance terminology in HCI literature, and discuss the current state of engagement with publications from the field of performance theory. We identify three opportunities for HCI, which can arise through further engagement with the knowledge produced in contemporary dance and performance: to engage with the field of embodied interaction from the perspective of performance research and theory; to employ contemporary dance methods and practices in HCI research; and to integrate contemporary dance choreographers and performers as researchers in interdisciplinary projects
A Formal and Functional Analysis of Gaze, Gestures, and Other Body Movements in a Contemporary Dance Improvisation Performance
UID/FIL/00183/2019
PTDC/FER‐FIL/28278/2017This study presents a microanalysis of what information performers “give” and “give off” to each other via their bodies during a contemporary dance improvisation. We compare what expert performers and non-performers (sufficiently trained to successfully perform) do with their bodies during a silent, multiparty improvisation exercise, in order to identify any differences and to provide insight into nonverbal communication in a less conventional setting. The coordinated collaboration of the participants (two groups of six) was examined in a frame-by-frame analysis focusing on all body movements, including gaze shifts as well as the formal and functional movement units produced in the head–face, upper-, and lower-body regions. The Methods section describes in detail the annotation process and inter-rater agreement. The results of this study indicate that expert performers during the improvisation are in “performance mode” and have embodied other social cognitive strategies and skills (e.g., endogenous orienting, gaze avoidance, greater motor control) that the non-performers do not have available. Expert performers avoid using intentional communication, relying on information to be inferentially communicated in order to coordinate collaboratively, with silence and stillness being construed as meaningful in that social practice and context. The information that expert performers produce is quantitatively less (i.e., producing fewer body movements) and qualitatively more inferential than intentional compared to a control group of non-performers, which affects the quality of the performance.publishersversionpublishe
- …