743 research outputs found

    Video browsing interfaces and applications: a review

    Get PDF
    We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    "How Did They Come Across?" Lessons Learned from Continuous Affective Ratings

    Full text link
    Social distance, or perception of the other, is recognized as a dynamic dimension of an interaction, but yet to be widely explored or understood. Through CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, we collected retrospective ratings of interpersonal perceptions between 12 participant dyads. In this work, we explore how different aspects of these interactions reflect on the ratings collected, through a discourse analysis of individual and social behavior of the interactants. We found that different events observed in the ratings can be mapped to complex interaction phenomena, shedding light on relevant interaction features that may play a role in interpersonal understanding and grounding. This paves the way for better, more seamless human-robot interactions, where affect is interpreted as highly dynamic and contingent on interaction history.Comment: arXiv admin note: substantial text overlap with arXiv:2306.1662

    CORAE: A Tool for Intuitive and Continuous Retrospective Evaluation of Interactions

    Full text link
    This paper introduces CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, designed to capture continuous affect data about interpersonal perceptions in dyadic interactions. Grounded in behavioral ecology perspectives of emotion, this approach replaces valence as the relevant rating dimension with approach and withdrawal, reflecting the degree to which behavior is perceived as increasing or decreasing social distance. We conducted a study to experimentally validate the efficacy of our platform with 24 participants. The tool's effectiveness was tested in the context of dyadic negotiation, revealing insights about how interpersonal dynamics evolve over time. We find that the continuous affect rating method is consistent with individuals' perception of the overall interaction. This paper contributes to the growing body of research on affective computing and offers a valuable tool for researchers interested in investigating the temporal dynamics of affect and emotion in social interactions

    General highlight detection in sport videos

    Get PDF
    Attention is a psychological measurement of human reflection against stimulus. We propose a general framework of highlight detection by comparing attention intensity during the watching of sports videos. Three steps are involved: adaptive selection on salient features, unified attention estimation and highlight identification. Adaptive selection computes feature correlation to decide an optimal set of salient features. Unified estimation combines these features by the technique of multi-resolution autoregressive (MAR) and thus creates a temporal curve of attention intensity. We rank the intensity of attention to discriminate boundaries of highlights. Such a framework alleviates semantic uncertainty around sport highlights and leads to an efficient and effective highlight detection. The advantages are as follows: (1) the capability of using data at coarse temporal resolutions; (2) the robustness against noise caused by modality asynchronism, perception uncertainty and feature mismatch; (3) the employment of Markovian constrains on content presentation, and (4) multi-resolution estimation on attention intensity, which enables the precise allocation of event boundaries

    Earth as Interface: Exploring chemical senses with Multisensory HCI Design for Environmental Health Communication

    Get PDF
    As environmental problems intensify, the chemical senses -that is smell and taste, are the most relevantsenses to evidence them.As such, environmental exposure vectors that can reach human beings comprise air,food, soil and water[1].Within this context, understanding the link between environmental exposures andhealth[2]is crucial to make informed choices, protect the environment and adapt to new environmentalconditions[3].Smell and taste lead therefore to multi-sensorial experiences which convey multi-layered information aboutlocal and global events[4]. However, these senses are usually absent when those problems are represented indigital systems. The multisensory HCIdesign framework investigateschemical sense inclusion withdigital systems[5]. Ongoing efforts tackledigitalization of smell and taste for digital delivery, transmission or substitution [6]. Despite experimentsproved technological feasibility, its dissemination depends on relevant applicationdevelopment[7].This thesis aims to fillthose gaps by demonstratinghow chemical senses provide the means to link environment and health based on scientific andgeolocation narratives [8], [9],[10]. We present a Multisensory HCI design process which accomplished symbolicdisplaying smell and taste and led us to a new multi-sensorial interaction system presented herein. We describe the conceptualization, design and evaluation of Earthsensum, an exploratory case study project.Earthsensumoffered to 16 participants in the study, environmental smell and taste experiences about real geolocations to participants of the study. These experiences were represented digitally using mobilevirtual reality (MVR) and mobile augmented reality (MAR). Its technologies bridge the real and digital Worlds through digital representations where we can reproduce the multi-sensorial experiences. Our study findings showed that the purposed interaction system is intuitive and can lead not only to a betterunderstanding of smell and taste perception as also of environmental problems. Participants comprehensionabout the link between environmental exposures and health was successful and they would recommend thissystem as education tools. Our conceptual design approach was validated and further developments wereencouraged.In this thesis,we demonstratehow to applyMultisensory HCI methodology to design with chemical senses. Weconclude that the presented symbolic representation model of smell and taste allows communicatingtheseexperiences on digital platforms. Due to its context-dependency, MVR and MAR platforms are adequatetechnologies to be applied for this purpose.Future developments intend to explore further the conceptual approach. These developments are centredon the use of the system to induce hopefully behaviourchange. Thisthesisopens up new application possibilities of digital chemical sense communication,Multisensory HCI Design and environmental health communication.À medida que os problemas ambientais se intensificam, os sentidos químicos -isto é, o cheiroe sabor, são os sentidos mais relevantes para evidenciá-los. Como tais, os vetores de exposição ambiental que podem atingir os seres humanos compreendem o ar, alimentos, solo e água [1]. Neste contexto, compreender a ligação entre as exposições ambientais e a saúde [2] é crucial para exercerescolhas informadas, proteger o meio ambiente e adaptar a novas condições ambientais [3]. O cheiroe o saborconduzemassima experiências multissensoriais que transmitem informações de múltiplas camadas sobre eventos locais e globais [4]. No entanto, esses sentidos geralmente estão ausentes quando esses problemas são representados em sistemas digitais. A disciplina do design de Interação Humano-Computador(HCI)multissensorial investiga a inclusão dossentidos químicos em sistemas digitais [9]. O seu foco atual residena digitalização de cheirose sabores para o envio, transmissão ou substituiçãode sentidos[10]. Apesar dasexperimentaçõescomprovarem a viabilidade tecnológica, a sua disseminação está dependentedo desenvolvimento de aplicações relevantes [11]. Estatese pretendepreencher estas lacunas ao demonstrar como os sentidos químicos explicitama interconexãoentre o meio ambiente e a saúde, recorrendo a narrativas científicas econtextualizadasgeograficamente[12], [13], [14]. Apresentamos uma metodologiade design HCImultissensorial que concretizouum sistema de representação simbólica de cheiro e sabor e nos conduziu a um novo sistema de interação multissensorial, que aqui apresentamos. Descrevemos o nosso estudo exploratório Earthsensum, que integra aconceptualização, design e avaliação. Earthsensumofereceu a 16participantes do estudo experiências ambientais de cheiro e sabor relacionadas com localizações geográficasreais. Essas experiências foram representadas digitalmente através derealidade virtual(VR)e realidade aumentada(AR).Estas tecnologias conectamo mundo real e digital através de representações digitais onde podemos reproduzir as experiências multissensoriais. Os resultados do nosso estudo provaramque o sistema interativo proposto é intuitivo e pode levar não apenas a uma melhor compreensão da perceção do cheiroe sabor, como também dos problemas ambientais. O entendimentosobre a interdependência entre exposições ambientais e saúde teve êxitoe os participantes recomendariam este sistema como ferramenta para aeducação. A nossa abordagem conceptual foi positivamentevalidadae novos desenvolvimentos foram incentivados. Nesta tese, demonstramos como aplicar metodologiasde design HCImultissensorialpara projetar com ossentidos químicos. Comprovamosque o modelo apresentado de representação simbólica do cheiroe do saborpermite comunicar essas experiênciasem plataformas digitais. Por serem dependentesdocontexto, as plataformas de aplicações emVR e AR são tecnologias adequadaspara este fim.Desenvolvimentos futuros pretendem aprofundar a nossa abordagemconceptual. Em particular, aspiramos desenvolvera aplicaçãodo sistema para promover mudanças de comportamento. Esta tese propõenovas possibilidades de aplicação da comunicação dos sentidos químicos em plataformas digitais, dedesign multissensorial HCI e de comunicação de saúde ambiental

    Effects of mediated social touch on affective experiences and trust

    Get PDF
    This study investigated whether communication via mediated hand pressure during a remotely shared experience (watching an amusing video) can (1) enhance recovery from sadness, (2) enhance the affective quality of the experience, and (3) increase trust towards the communication partner. Thereto participants first watched a sad movie clip to elicit sadness, followed by a funny one to stimulate recovery from sadness. While watching the funny clip they signaled a hypothetical fellow participant every time they felt amused. In the experimental condition the participants responded by pressing a hand-held two-way mediated touch device (a Frebble), which also provided haptic feedback via simulated hand squeezes. In the control condition they responded by pressing a button and they received abstract visual feedback. Objective (heart rate, galvanic skin conductance, number and duration of joystick or Frebble presses) and subjective (questionnaires) data were collected to assess the emotional reactions of the participants. The subjective measurements confirmed that the sad movie successfully induced sadness while the funny movie indeed evoked more positive feelings. Although their ranking agreed with the subjective measurements, the physiological measurements confirmed this conclusion only for the funny movie. The results show that recovery from movie induced sadness, the affective experience of the amusing movie, and trust towards the communication partner did not differ between both experimental conditions. Hence, feedback via mediated hand touching did not enhance either of these factors compared to visual feedback. Further analysis of the data showed that participants scoring low on Extraversion (i.e., persons that are more introvert) or low on Touch Receptivity (i.e., persons who do not like to be touched by others) felt better understood by their communication partner when receiving mediated touch feedback instead of visual feedback, while the opposite was found for participants scoring high on these factors. The implications of these results for further research are discussed, and some suggestions for follow-up experiments are presented
    corecore