192 research outputs found

    Video interaction using pen-based technology

    Get PDF
    Dissertação para obtenção do Grau de Doutor em InformáticaVideo can be considered one of the most complete and complex media and its manipulating is still a difficult and tedious task. This research applies pen-based technology to video manipulation, with the goal to improve this interaction. Even though the human familiarity with pen-based devices, how they can be used on video interaction, in order to improve it, making it more natural and at the same time fostering the user’s creativity is an open question. Two types of interaction with video were considered in this work: video annotation and video editing. Each interaction type allows the study of one of the interaction modes of using pen-based technology: indirectly, through digital ink, or directly, trough pen gestures or pressure. This research contributes with two approaches for pen-based video interaction: pen-based video annotations and video as ink. The first uses pen-based annotations combined with motion tracking algorithms, in order to augment video content with sketches or handwritten notes. It aims to study how pen-based technology can be used to annotate a moving objects and how to maintain the association between a pen-based annotations and the annotated moving object The second concept replaces digital ink by video content, studding how pen gestures and pressure can be used on video editing and what kind of changes are needed in the interface, in order to provide a more familiar and creative interaction in this usage context.This work was partially funded by the UTAustin-Portugal, Digital Media, Program (Ph.D. grant: SFRH/BD/42662/2007 - FCT/MCTES); by the HP Technology for Teaching Grant Initiative 2006; by the project "TKB - A Transmedia Knowledge Base for contemporary dance" (PTDC/EAT/AVP/098220/2008 funded by FCT/MCTES); and by CITI/DI/FCT/UNL (PEst-OE/EEI/UI0527/2011

    Modeling of Performance Creative Evaluation Driven by Multimodal Affective Data

    Get PDF
    Performance creative evaluation can be achieved through affective data, and the use of affective featuresto evaluate performance creative is a new research trend. This paper proposes a “Performance Creative—Multimodal Affective (PC-MulAff)” model based on the multimodal affective features for performance creative evaluation. The multimedia data acquisition equipment is used to collect the physiological data of the audience, including the multimodal affective data such as the facial expression, heart rate and eye movement. Calculate affective features of multimodal data combined with director annotation, and defined “Performance Creative—Affective Acceptance (PC-Acc)” based on multimodal affective features to evaluate the quality of performance creative. This paper verifies the PC-MulAff model on different performance data sets. The experimental results show that the PC-MulAff model shows high evaluation quality in different performance forms. In the creative evaluation of dance performance, the accuracy of the model is 7.44% and 13.95% higher than that of the single textual and single video evaluation

    People and object tracking for video annotation

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaObject tracking is a thoroughly researched problem, with a body of associated literature dating at least as far back as the late 1970s. However, and despite the development of some satisfactory real-time trackers, it has not yet seen widespread use. This is not due to a lack of applications for the technology, since several interesting ones exist. In this document, it is postulated that this status quo is due, at least in part, to a lack of easy to use software libraries supporting object tracking. An overview of the problems associated with object tracking is presented and the process of developing one such library is documented. This discussion includes how to overcome problems like heterogeneities in object representations and requirements for training or initial object position hints. Video annotation is the process of associating data with a video’s content. Associating data with a video has numerous applications, ranging from making large video archives or long videos searchable, to enabling discussion about and augmentation of the video’s content. Object tracking is presented as a valid approach to both automatic and manual video annotation, and the integration of the developed object tracking library into an existing video annotator, running on a tablet computer, is described. The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our interaction approaches to handle moving object selection on live video, which we have called “Hold and Overlay” and “Hold and Speed Up”. In addition, the results of a set of preliminary tests are reported.project “TKB – A Transmedia Knowledge Base for contemporary dance” (PTDC/EA /AVP/098220/2008 funded by FCT/MCTES), the UTAustin – Portugal, Digital Media Program (SFRH/BD/42662/2007 FCT/MCTES) and by CITI/DI/FCT/UNL (Pest-OE/EEI/UI0527/2011

    The Role of Eye Gaze and Body Movements in Turn-Taking during a Contemporary Dance Improvisation

    Get PDF
    Abstract This paper intends to contribute to the multimodal turn-taking literature by presenting data collected in an improvisation session in the context of the performing arts and its qualiquantitative analysis, where the focus is on how gaze and the full body participate in the interaction. Five expert performers joined Portuguese contemporary choreographer, João Fiadeiro, in practicing his Real Time Composition Method during an improvisation session, which was recorded and annotated for this study. A micro-analysis of portions of the session was conducted using ELAN. We found that intersubjectivity was avoided during this performance, both in the performers' bodily movements and mutual gaze; we extrapolate that peripheral vision was chiefly deployed as a regulating strategy by these experts to coordinate turn-taking. A macro-analysis comparing the data with an analogous one obtained from NonPerformers provides the context for a discussion on multimodality and decision-making

    FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions

    Get PDF
    Multimodal analyses have been growing in importance within several approaches to Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless fine-grained semantic representations of multimodal objects are still lacking, especially in terms of integrating areas such as Natural Language Processing and Computer Vision, which are key for the implementation of multimodality in Computational Linguistics. In this dissertation, we propose a methodology for extending FrameNet annotation to the multimodal domain, since FrameNet can provide fine-grained semantic representations, particularly with a database enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the hypothesis that similarly to the way in which words in a sentence evoke frames and organize their elements in the syntactic locality accompanying them, visual elements in video shots may, also, evoke frames and organize their elements on the screen or work complementarily with the frame evocation patterns of the sentences narrated simultaneously to their appearance on screen, providing different profiling and perspective options for meaning construction. The corpus annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre chosen also configures a novel experimental setting for research on integrated image and text comprehension, since, in this corpus, text is not a direct description of the image sequence but correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker experiment conducted to validate the approach proposed to a text-oriented annotation. The experiment demonstrated that it is not possible to determine that text impacts gaze directly and was taken as a reinforcement to the approach of valorizing modes combination. Last, we present the Frame2 dataset, the product of the annotation task carried out for the corpus following both the methodology and guidelines proposed. The results achieved demonstrate that, at least for this TV genre but possibly also for others, a fine-grained semantic annotation tackling the diverse correlations that take place in a multimodal setting provides new perspective in multimodal comprehension modeling. Moreover, multimodal annotation also enriches the development of FrameNets, to the extent that correlations found between modalities can attest the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos multimodais, especialmente em termos de integração de áreas como Processamento de Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais, delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar de forma complementar aos padrões de evocação de frames das sentenças narradas simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens, mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para uma anotação orientada por texto. O experimento demonstrou que não é possível determinar que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva na modelagem da compreensão multimodal. Além disso, a anotação multimodal também enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio

    ARTeFACT Movement Thesaurus

    Get PDF
    The ARTeFACT Movement Thesaurus is a continuation of the ARTeFACT project which was developed at the University of Virginia as a means of enabling research into movement-based arts, specifically dance. The Movement Thesaurus is a major step toward providing access to movement-derived data. By using motion capture technologies we plan to provide a sophisticated, open source tool that can help make film searchable for single movements and movement phrases. The ARTeFACT Movement Thesaurus will contain over 100 codified dance movements derived from Western concert dance genres and styles from which we can develop algorithms for automatic search capabilities in film. By bringing together engineers, movement specialists, and mathematicians we will forge ahead to break new ground in movement research and take one step closer to the creation of an automated means of mining danced texts and filmed movement

    A Formal and Functional Analysis of Gaze, Gestures, and Other Body Movements in a Contemporary Dance Improvisation Performance

    Get PDF
    UID/FIL/00183/2019 PTDC/FER‐FIL/28278/2017This study presents a microanalysis of what information performers “give” and “give off” to each other via their bodies during a contemporary dance improvisation. We compare what expert performers and non-performers (sufficiently trained to successfully perform) do with their bodies during a silent, multiparty improvisation exercise, in order to identify any differences and to provide insight into nonverbal communication in a less conventional setting. The coordinated collaboration of the participants (two groups of six) was examined in a frame-by-frame analysis focusing on all body movements, including gaze shifts as well as the formal and functional movement units produced in the head–face, upper-, and lower-body regions. The Methods section describes in detail the annotation process and inter-rater agreement. The results of this study indicate that expert performers during the improvisation are in “performance mode” and have embodied other social cognitive strategies and skills (e.g., endogenous orienting, gaze avoidance, greater motor control) that the non-performers do not have available. Expert performers avoid using intentional communication, relying on information to be inferentially communicated in order to coordinate collaboratively, with silence and stillness being construed as meaningful in that social practice and context. The information that expert performers produce is quantitatively less (i.e., producing fewer body movements) and qualitatively more inferential than intentional compared to a control group of non-performers, which affects the quality of the performance.publishersversionpublishe
    corecore