793 research outputs found

    Interactive Video Search

    Get PDF
    With an increasing amount of video data in our daily life, the need for content-based search in videos increases as well. Though a lot of research has been spent on video retrieval tools and methods which allow for automatic search in videos through content-based queries, still the performance of automatic video retrieval is far from optimal. In this tutorial we discussed (i) proposed solutions for improved video content navigation, (ii) typical interaction of content-based querying features, and (iii) advanced video content visualization methods. Moreover, we discussed interactive video search systems and ways to evaluate their performance

    Analysis of user behavior with different interfaces in 360-degree videos and virtual reality

    Get PDF
    [eng] Virtual reality and its related technologies are being used for many kinds of content, like virtual environments or 360-degree videos. Omnidirectional, interactive, multimedia is consumed with a variety of devices, such as computers, mobile devices, or specialized virtual reality gear. Studies on user behavior with computer interfaces are an important part of the research in human-computer interaction, used in, e.g., studies on usability, user experience or the improvement of streaming techniques. User behavior in these environments has drawn the attention of the field but little attention has been paid to compare the behavior between different devices to reproduce virtual environments or 360-degree videos. We introduce an interactive system that we used to create and reproduce virtual reality environments and experiences based on 360-degree videos, which is able to automatically collect the users’ behavior, so we can analyze it. We studied the behavior collected in the reproduction of a virtual reality environment with this system and we found significant differences in the behavior between users of an interface based on the Oculus Rift and another based on a mobile VR headset similar to the Google Cardboard: different time between interactions, likely due to the need to perform a gesture in the first interface; differences in spatial exploration, as users of the first interface chose a particular area of the environment to stay; and differences in the orientation of their heads, as Oculus users tended to look towards physical objects in the experiment setup and mobile users seemed to be influenced by the initial values of orientation of their browsers. A second study was performed with data collected with this system, which was used to play a hypervideo production made of 360-degree videos, where we compared the users’ behavior with four interfaces (two based on immersive devices and the other two based on non-immersive devices) and with two categories of videos: we found significant differences in the spatiotemporal exploration, the dispersion of the orientation of the users, in the movement of these orientations and in the clustering of their trajectories, especially between different video types but also between devices, as we found that in some cases, behavior with immersive devices was similar due to similar constraints in the interface, which are not present in non-immersive devices, such as a computer mouse or the touchscreen of a smartphone. Finally, we report a model based on a recurrent neural network that is able to classify these reproductions with 360-degree videos into their corresponding video type and interface with an accuracy of more than 90% with only four seconds worth of orientation data; another deep learning model was implemented to predict orientations up to two seconds in the future from the last seconds of orientation, whose results were improved by up to 19% by a comparable model that leverages the video type and the device used to play it.[cat] La realitat virtual i les tecnologies que hi estan relacionades es fan servir per a molts tipus de continguts, com entorns virtuals o vídeos en 360 graus. Continguts multimèdia omnidireccional i interactiva són consumits amb diversos dispositius, com ordinadors, dispositius mòbils o aparells especialitzats de realitat virtual. Els estudis del comportament dels usuaris amb interfícies d’ordinador són una part important de la recerca en la interacció persona-ordinador fets servir en, per exemple, estudis de usabilitat, d’experiència d’usuari o de la millora de tècniques de transmissió de vídeo. El comportament dels usuaris en aquests entorns ha atret l’atenció dels investigadors, però s’ha parat poca atenció a comparar el comportament dels usuaris entre diferents dispositius per reproduir entorns virtuals o vídeos en 360 graus. Nosaltres introduïm un sistema interactiu que hem fet servir per crear i reproduir entorns de realitat virtual i experiències basades en vídeos en 360 graus, que és capaç de recollir automàticament el comportament dels usuaris, de manera que el puguem analitzar. Hem estudiat el comportament recollit en la reproducció d’un entorn de realitat virtual amb aquest sistema i hem trobat diferències significatives en l’execució entre usuaris d’una interfície basada en Oculus Rift i d’una altra basada en un visor de RV mòbil semblant a la Google Cardboard: diferent temps entre interaccions, probablement causat per la necessitat de fer un gest amb la primera interfície; diferències en l’exploració espacial, perquè els usuaris de la primera interfície van triar romandre en una àrea de l’entorn; i diferències en l’orientació dels seus caps, ja que els usuaris d’Oculus tendiren a mirar cap a objectes físics de la instal·lació de l’experiment i els usuaris dels visors mòbils semblen influïts pels valors d’orientació inicials dels seus navegadors. Un segon estudi va ser executat amb les dades recollides amb aquest sistema, que va ser fet servir per reproduir un hipervídeo fet de vídeos en 360 graus, en què hem comparat el comportament dels usuaris entre quatre interfícies (dues basades en dispositius immersius i dues basades en dispositius no immersius) i dues categories de vídeos: hem trobat diferències significatives en l’exploració de l’espaitemps del vídeo, en la dispersió de l’orientació dels usuaris, en el moviment d’aquestes orientacions i en l’agrupació de les seves trajectòries, especialment entre diferents tipus de vídeo però també entre dispositius, ja que hem trobat que, en alguns casos, el comportament amb dispositius immersius és similar a causa de límits semblants en la interfície, que no són presents en dispositius no immersius, com amb un ratolí d’ordinador o la pantalla tàctil d’un mòbil. Finalment, hem reportat un model basat en una xarxa neuronal recurrent, que és capaç de classificar aquestes reproduccions de vídeos en 360 graus en els seus corresponents tipus de vídeo i interfície que s’ha fet servir amb una precisió de més del 90% amb només quatre segons de trajectòria d’orientacions; un altre model d’aprenentatge profund ha estat implementat per predir orientacions fins a dos segons en el futur a partir dels darrers segons d’orientació, amb uns resultats que han estat millorats fins a un 19% per un model comparable que aprofita el tipus de vídeo i el dispositiu que s’ha fet servir per reproduir-lo.[spa] La realidad virtual y las tecnologías que están relacionadas con ella se usan para muchos tipos de contenidos, como entornos virtuales o vídeos en 360 grados. Contenidos multimedia omnidireccionales e interactivos son consumidos con diversos dispositivos, como ordenadores, dispositivos móviles o aparatos especializados de realidad virtual. Los estudios del comportamiento de los usuarios con interfaces de ordenador son una parte importante de la investigación en la interacción persona-ordenador usados en, por ejemplo, estudios de usabilidad, de experiencia de usuario o de la mejora de técnicas de transmisión de vídeo. El comportamiento de los usuarios en estos entornos ha atraído la atención de los investigadores, pero se ha dedicado poca atención en comparar el comportamiento de los usuarios entre diferentes dispositivos para reproducir entornos virtuales o vídeos en 360 grados. Nosotros introducimos un sistema interactivo que hemos usado para crear y reproducir entornos de realidad virtual y experiencias basadas en vídeos de 360 grados, que es capaz de recoger automáticamente el comportamiento de los usuarios, de manera que lo podamos analizar. Hemos estudiado el comportamiento recogido en la reproducción de un entorno de realidad virtual con este sistema y hemos encontrado diferencias significativas en la ejecución entre usuarios de una interficie basada en Oculus Rift y otra basada en un visor de RV móvil parecido a la Google Cardboard: diferente tiempo entre interacciones, probablemente causado por la necesidad de hacer un gesto con la primera interfaz; diferencias en la exploración espacial, porque los usuarios de la primera interfaz permanecieron en un área del entorno; y diferencias en la orientación de sus cabezas, ya que los usuarios de Oculus tendieron a mirar hacia objetos físicos en la instalación del experimento y los usuarios de los visores móviles parecieron influidos por los valores iniciales de orientación de sus navegadores. Un segundo estudio fue ejecutado con los datos recogidos con este sistema, que fue usado para reproducir un hipervídeo compuesto de vídeos en 360 grados, en el que hemos comparado el comportamiento de los usuarios entre cuatro interfaces (dos basadas en dispositivos inmersivos y dos basadas en dispositivos no inmersivos) y dos categorías de vídeos: hemos encontrado diferencias significativas en la exploración espaciotemporal del vídeo, en la dispersión de la orientación de los usuarios, en el movimiento de estas orientaciones y en la agrupación de sus trayectorias, especialmente entre diferentes tipos de vídeo pero también entre dispositivos, ya que hemos encontrado que, en algunos casos, el comportamiento con dispositivos inmersivos es similar a causa de límites parecidos en la interfaz, que no están presentes en dispositivos no inmersivos, como con un ratón de ordenador o la pantalla táctil de un móvil. Finalmente, hemos reportado un modelo basado en una red neuronal recurrente, que es capaz de clasificar estas reproducciones de vídeos en 360 grados en sus correspondientes tipos de vídeo y la interfaz que se ha usado con una precisión de más del 90% con sólo cuatro segundos de trayectoria de orientación; otro modelo de aprendizaje profundo ha sido implementad para predecir orientaciones hasta dos segundos en el futuro a partir de los últimos segundos de orientación, con unos resultados que han sido mejorados hasta un 19% por un modelo comparable que aprovecha el tipo de vídeo y el dispositivo que se ha usado para reproducirlo

    09251 Abstracts Collection -- Scientific Visualization

    Get PDF
    From 06-14-2009 to 06-19-2009, the Dagstuhl Seminar 09251 ``Scientific Visualization \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, over 50 international participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general

    Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data

    Get PDF
    This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and

    Self-supervised learning to detect key frames in videos

    Get PDF
    © 2020 by the authors. Licensee MDPI, Basel, Switzerland. Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method

    E3: Emotions, Engagement, and Educational Digital Games

    Get PDF
    The use of educational digital games as a method of instruction for science, technology, engineering, and mathematics has increased in the past decade. While these games provide successfully implemented interactive and fun interfaces, they are not designed to respond or remedy students’ negative affect towards the game dynamics or their educational content. Therefore, this exploratory study investigated the frequent patterns of student emotional and behavioral response to educational digital games. To unveil the sequential occurrence of these affective states, students were assigned to play the game for nine class sessions. During these sessions, their affective and behavioral response was recorded to uncover possible underlying patterns of affect (particularly confusion, frustration, and boredom) and behavior (disengagement). In addition, these affect and behavior frequency pattern data were combined with students’ gameplay data in order to identify patterns of emotions that led to a better performance in the game. The results provide information on possible affect and behavior patterns that could be used in further research on affect and behavior detection in such open-ended digital game environments. Particularly, the findings show that students experience a considerable amount of confusion, frustration, and boredom. Another finding highlights the need for remediation via embedded help, as the students referred to peer help often during their gameplay. However, possibly because of the low quality of the received help, students seemed to become frustrated or disengaged with the environment. Finally, the findings suggest the importance of the decay rate of confusion; students’ gameplay performance was associated with the length of time students remained confused or frustrated. Overall, these findings show that there are interesting patterns related to students who experience relatively negative emotions during their gameplay

    A Computational Framework to Support the Automated Analysis of Routine Electroencephalographic Data

    Get PDF
    Epilepsy is a condition in which a patient has multiple unprovoked seizures which are not precipitated by another medical condition. It is a common neurological disorder that afflicts 1% of the population of the US, and is sometimes hard to diagnose if seizures are infrequent. Routine Electroencephalography (rEEG), where the electrical potentials of the brain are recorded on the scalp of a patient, is one of the main tools for diagnosing because rEEG can reveal indicators of epilepsy when patients are in a non-seizure state. Interpretation of rEEG is difficult and studies have shown that 20-30% of patients at specialized epilepsy centers are misdiagnosed. An improved ability to interpret rEEG could decrease the misdiagnosis rate of epilepsy. The difficulty in diagnosing epilepsy from rEEG stems from the large quantity, low signal to noise ratio (SNR), and variability of the data. A usual point of error for a clinician interpreting rEEG data is the misinterpretation of PEEs (paroxysmal EEG events) ( short bursts of electrical activity of high amplitude relative to the surrounding signals that have a duration of approximately .1 to 2 seconds). Clinical interpretation of PEEs could be improved with the development of an automated system to detect and classify PEE activity in an rEEG dataset. Systems that have attempted to automatically classify PEEs in the past have had varying degrees of success. These efforts have been hampered to a large extent by the absence of a \gold standard\u27 data set that EEG researchers could use. In this work we present a distributed, web-based collaborative system for collecting and creating a gold standard dataset for the purpose of evaluating spike detection software. We hope to advance spike detection research by creating a performance standard that facilitates comparisons between approaches of disparate research groups. Further, this work endeavors to create a new, high performance parallel implementation of ICA (independent component analysis), a potential preprocessing step for PEE classification. We also demonstrate tools for visualization and analysis to support the initial phases of spike detection research. These tools will first help to develop a standardized rEEG dataset of expert EEG interpreter opinion with which automated analysis can be trained and tested. Secondly, it will attempt to create a new framework for interdisciplinary research that will help improve our understanding of PEEs in rEEG. These improvements could ultimately advance the nuanced art of rEEG interpretation and decrease the misdiagnosis rate that leads to patients suering inappropriate treatment
    • …