22 research outputs found

    Supporting Arbitrary Zoom in Zoomable Video

    Get PDF
    Master'sMASTER OF SCIENC

    Automatic Mobile Video Remixing and Collaborative Watching Systems

    Get PDF
    In the thesis, the implications of combining collaboration with automation for remix creation are analyzed. We first present a sensor-enhanced Automatic Video Remixing System (AVRS), which intelligently processes mobile videos in combination with mobile device sensor information. The sensor-enhanced AVRS system involves certain architectural choices, which meet the key system requirements (leverage user generated content, use sensor information, reduce end user burden), and user experience requirements. Architecture adaptations are required to improve certain key performance parameters. In addition, certain operating parameters need to be constrained, for real world deployment feasibility. Subsequently, sensor-less cloud based AVRS and low footprint sensorless AVRS approaches are presented. The three approaches exemplify the importance of operating parameter tradeoffs for system design. The approaches cover a wide spectrum, ranging from a multimodal multi-user client-server system (sensor-enhanced AVRS) to a mobile application which can automatically generate a multi-camera remix experience from a single video. Next, we present the findings from the four user studies involving 77 users related to automatic mobile video remixing. The goal was to validate selected system design goals, provide insights for additional features and identify the challenges and bottlenecks. Topics studied include the role of automation, the value of a video remix as an event memorabilia, the requirements for different types of events and the perceived user value from creating multi-camera remix from a single video. System design implications derived from the user studies are presented. Subsequently, sport summarization, which is a specific form of remix creation is analyzed. In particular, the role of content capture method is analyzed with two complementary approaches. The first approach performs saliency detection in casually captured mobile videos; in contrast, the second one creates multi-camera summaries from role based captured content. Furthermore, a method for interactive customization of summary is presented. Next, the discussion is extended to include the role of users’ situational context and the consumed content in facilitating collaborative watching experience. Mobile based collaborative watching architectures are described, which facilitate a common shared context between the participants. The concept of movable multimedia is introduced to highlight the multidevice environment of current day users. The thesis presents results which have been derived from end-to-end system prototypes tested in real world conditions and corroborated with extensive user impact evaluation

    Compréhension de contenus visuels par analyse conjointe du contenu et des usages

    Get PDF
    Dans cette thĂšse, nous traitons de la comprĂ©hension de contenus visuels, qu’il s’agisse d’images, de vidĂ©os ou encore de contenus 3D. On entend par comprĂ©hension la capacitĂ© Ă  infĂ©rer des informations sĂ©mantiques sur le contenu visuel. L’objectif de ce travail est d’étudier des mĂ©thodes combinant deux approches : 1) l’analyse automatique des contenus et 2) l’analyse des interactions liĂ©es Ă  l’utilisation de ces contenus (analyse des usages, en plus bref). Dans un premier temps, nous Ă©tudions l’état de l’art issu des communautĂ©s de la vision par ordinateur et du multimĂ©dia. Il y a 20 ans, l’approche dominante visait une comprĂ©hension complĂštement automatique des images. Cette approche laisse aujourd’hui plus de place Ă  diffĂ©rentes formes d’interventions humaines. Ces derniĂšres peuvent se traduire par la constitution d’une base d’apprentissage annotĂ©e, par la rĂ©solution interactive de problĂšmes (par exemple de dĂ©tection ou de segmentation) ou encore par la collecte d’informations implicites issues des usages du contenu. Il existe des liens riches et complexes entre supervision humaine d’algorithmes automatiques et adaptation des contributions humaines via la mise en Ɠuvre d’algorithmes automatiques. Ces liens sont Ă  l’origine de questions de recherche modernes : comment motiver des intervenants humains ? Comment concevoir des scĂ©narii interactifs pour lesquels les interactions contribuent Ă  comprendre le contenu manipulĂ© ? Comment vĂ©rifier la qualitĂ© des traces collectĂ©es ? Comment agrĂ©ger les donnĂ©es d’usage ? Comment fusionner les donnĂ©es d’usage avec celles, plus classiques, issues d’une analyse automatique ? Notre revue de la littĂ©rature aborde ces questions et permet de positionner les contributions de cette thĂšse. Celles-ci s’articulent en deux grandes parties. La premiĂšre partie de nos travaux revisite la dĂ©tection de rĂ©gions importantes ou saillantes au travers de retours implicites d’utilisateurs qui visualisent ou acquiĂšrent des con- tenus visuels. En 2D d’abord, plusieurs interfaces de vidĂ©os interactives (en particulier la vidĂ©o zoomable) sont conçues pour coordonner des analyses basĂ©es sur le contenu avec celles basĂ©es sur l’usage. On gĂ©nĂ©ralise ces rĂ©sultats en 3D avec l’introduction d’un nouveau dĂ©tecteur de rĂ©gions saillantes dĂ©duit de la capture simultanĂ©e de vidĂ©os de la mĂȘme performance artistique publique (spectacles de danse, de chant etc.) par de nombreux utilisateurs. La seconde contribution de notre travail vise une comprĂ©hension sĂ©mantique d’images fixes. Nous exploitons les donnĂ©es rĂ©coltĂ©es Ă  travers un jeu, Ask’nSeek, que nous avons crĂ©Ă©. Les interactions Ă©lĂ©mentaires (comme les clics) et les donnĂ©es textuelles saisies par les joueurs sont, comme prĂ©cĂ©demment, rapprochĂ©es d’analyses automatiques des images. Nous montrons en particulier l’intĂ©rĂȘt d’interactions rĂ©vĂ©latrices des relations spatiales entre diffĂ©rents objets dĂ©tectables dans une mĂȘme scĂšne. AprĂšs la dĂ©tection des objets d’intĂ©rĂȘt dans une scĂšne, nous abordons aussi le problĂšme, plus ambitieux, de la segmentation. ABSTRACT : This thesis focuses on the problem of understanding visual contents, which can be images, videos or 3D contents. Understanding means that we aim at inferring semantic information about the visual content. The goal of our work is to study methods that combine two types of approaches: 1) automatic content analysis and 2) an analysis of how humans interact with the content (in other words, usage analysis). We start by reviewing the state of the art from both Computer Vision and Multimedia communities. Twenty years ago, the main approach was aiming at a fully automatic understanding of images. This approach today gives way to different forms of human intervention, whether it is through the constitution of annotated datasets, or by solving problems interactively (e.g. detection or segmentation), or by the implicit collection of information gathered from content usages. These different types of human intervention are at the heart of modern research questions: how to motivate human contributors? How to design interactive scenarii that will generate interactions that contribute to content understanding? How to check or ensure the quality of human contributions? How to aggregate human contributions? How to fuse inputs obtained from usage analysis with traditional outputs from content analysis? Our literature review addresses these questions and allows us to position the contributions of this thesis. In our first set of contributions we revisit the detection of important (or salient) regions through implicit feedback from users that either consume or produce visual contents. In 2D, we develop several interfaces of interactive video (e.g. zoomable video) in order to coordinate content analysis and usage analysis. We also generalize these results to 3D by introducing a new detector of salient regions that builds upon simultaneous video recordings of the same public artistic performance (dance show, chant, etc.) by multiple users. The second contribution of our work aims at a semantic understanding of fixed images. With this goal in mind, we use data gathered through a game, Ask’nSeek, that we created. Elementary interactions (such as clicks) together with textual input data from players are, as before, mixed with automatic analysis of images. In particular, we show the usefulness of interactions that help revealing spatial relations between different objects in a scene. After studying the problem of detecting objects on a scene, we also adress the more ambitious problem of segmentation

    Activity Report 2003

    Get PDF

    Annotierte interaktive nichtlineare Videos - Software Suite, Download- und Cache-Management

    Get PDF
    Modern Web technology makes the dream of fully interactive and enriched video come true. Nowadays it is possible to organize videos in a non-linear way playing in a sequence unknown in advance. Furthermore, additional information can be added to the video, ranging from short descriptions to animated images and further videos. This affords an easy and efficient to use authoring tool which is capable of the management of the single media objects, as well as a clear arrangement of the links between the parts. Tools of this kind can be found rarely and do mostly not provide the full range of needed functions. While providing an interactive experience to the viewer in the Web player, parallel plot sequences and additional information lead to an increased download volume. This may cause pauses during playback while elements have to be downloaded which are displayed with the video. A good quality of experience for these videos with small waiting times and a playback without interruptions is desired. This work presents the SIVA Suite to create the previously described annotated interactive non-linear videos. We propose a video model for interactivity, non-linearity, and annotations, which is implemented in an XML format, an authoring tool, and a player. Video is the main medium, whereby different scenes are linked to a scene graph. Time controlled additional content called annotations, like text, images, audio files, or videos, is added to the scenes. The user is able to navigate in the scene graph by selecting a button at a button panel. Furthermore, other navigational elements like a table of contents or a keyword search are provided. Besides the SIVA Suite, this thesis presents algorithms and strategies for download and cache management to provide a good quality of experience while watching the annotated interactive non-linear videos. Therefor, we implemented a standard-independent player framework. Integrated into a simulation environment, the framework allows to evaluate algorithms and strategies for the calculation of start-up times, and the selection of elements to pre-fetch into and delete from the cache. Their interaction during the playback of non-linear video contents can be analyzed. The algorithms and strategies can be used to minimize interruptions in the video flow after user interactions. Our extensive evaluation showed that our techniques result in faster start-up times and lesser interruptions in the video flow than those of other players. Knowledge of the structure of an interactive non-linear video can be used to minimize the start-up time at the beginning of a video while minimizing an increase in the overall download volume.Moderne Web-Technologien lassen den Traum von voll interaktiven und bereicherten Videos wahr werden. Heutzutage ist es möglich, Videos in nicht-linearer Art und Weise zu organisieren, welche dann in einer vorher unbekannten Reihenfolge abgespielt werden können. Weiterhin können den Videos Zusatzinformationen in Form von kurzen Beschreibungen ĂŒber animierte Bilder bis hin zu weiteren Videos hinzugefĂŒgt werden. Dies erfordert ein einfach und effizient zu bedienendes Autorenwerkzeug, das in der Lage ist, sowohl einzelne Medien-Objekte zu verwalten, als auch die Verbindungen zwischen den einzelnen Teilen klar darzustellen. Tools dieser Art sind selten und bieten meist nicht den vollen benötigten Funktionsumfang. WĂ€hrend dem Betrachter dieses interaktive Erlebnis im Web Player zur VerfĂŒgung gestellt wird, fĂŒhren parallele HandlungsstrĂ€nge und zusĂ€tzliche Inhalte zu einem erhöhten Download-Volumen. Dies kann zu Pausen wĂ€hrend der Wiedergabe fĂŒhren, in denen Elemente vom Server geladen werden mĂŒssen, welche mit dem Video angezeigt werden sollen. Ein gutes Benutzungserlebnis fĂŒr solche Videos kann durch geringe Wartezeiten und eine unterbrechungsfreie Wiedergabe erreicht werden. Diese Arbeit stellt die SIVA Suite vor, mit der die zuvor beschriebenen annotierten interaktiven nicht-linearen Videos erstellt werden können. Wir bilden InteraktivitĂ€t, NichtlinearitĂ€t und Annotationen in einem Video-Model ab. Dieses wird in unserem XML-Format, Autorentool und Player umgesetzt. Als Leitmedium werden hierbei Videos verwendet, welche aufgeteilt in Szenen zu einer Graphstruktur zusammengefĂŒgt werden können. Zeitlich gesteuerte zusĂ€tzliche Inhalte, sogenannte Annotationen, wie Texte, Bilder, Audio-Dateien und Videos, werden den Szenen hinzugefĂŒgt. Der Betrachter kann im Szenengraph navigieren, indem er in einem bereitgestellten Button-Panel eine Nachfolgeszene auswĂ€hlt. Andere Navigationselemente sind ein Inhaltsverzeichnis sowie eine Suchfunktion. Neben der SIVA Suite beschreibt diese Arbeit Algorithmen und Strategien fĂŒr Download und Cache Management, um eine gute Nutzungserfahrung wĂ€hrend der Betrachtung der annotierten interaktiven nicht-linearen Videos zu bieten. Ein Webstandard-unabhĂ€ngiges Playerframework erlaubt es, das Zusammenspiel von Algorithmen und Strategien zu evaluieren, welche fĂŒr die Berechnung der Start-Zeitpunkte fĂŒr die Wiedergabe, sowie die Auswahl von vorauszuladenden sowie zu löschenden Elemente verwendet werden. Ziel ist es, Unterbrechungen zu minimieren, wenn der Ablauf des Videos durch Benutzerinteraktion beeinflusst wird. Unsere umfassende Evaluation zeigte, dass es möglich ist, kĂŒrzere Startup-Zeiten und weniger Unterbrechungen mit unseren Strategien zu erreichen, als bei der Verwendung der Strategien anderer Player. Die Kenntnis der Struktur des interaktiven nicht-linearen Videos kann dazu verwendet werden, die Startzeit am Anfang der Szenen zu minimieren, wĂ€hrend das Download-Volumen nicht erhöht wird

    Redefining the audio editor.

    Get PDF
    This thesis describes new design principles for audio editing software. This kind of software, also called audio editor, is the digital cutting table for sound and music production in which audio can be loaded or recorded, then selected and edited. First an understanding of the audio editor is established. Then a new approach to audio editing software design is developed, based on research into current software. This new approach consists of a set of design principles that aim at improving coherency, flexibility and creativity in the audio editing process. These principles are formed by carefully rethinking core elements in audio editing such as audio representation, selection and manipulation, editing flexibility, automation and personalisation. As artefact of this research, a concept audio editor called OFFline is presented in a second section. This audio editor demonstrates a possible implementation of the new design principles

    Scalable exploration of highly detailed and annotated 3D models

    Get PDF
    With the widespread availability of mobile graphics terminals andWebGL-enabled browsers, 3D graphics over the Internet is thriving. Thanks to recent advances in 3D acquisition and modeling systems, high-quality 3D models are becoming increasingly common, and are now potentially available for ubiquitous exploration. In current 3D repositories, such as Blend Swap, 3D Café or Archive3D, 3D models available for download are mostly presented through a few user-selected static images. Online exploration is limited to simple orbiting and/or low-fidelity explorations of simplified models, since photorealistic rendering quality of complex synthetic environments is still hardly achievable within the real-time constraints of interactive applications, especially on on low-powered mobile devices or script-based Internet browsers. Moreover, navigating inside 3D environments, especially on the now pervasive touch devices, is a non-trivial task, and usability is consistently improved by employing assisted navigation controls. In addition, 3D annotations are often used in order to integrate and enhance the visual information by providing spatially coherent contextual information, typically at the expense of introducing visual cluttering. In this thesis, we focus on efficient representations for interactive exploration and understanding of highly detailed 3D meshes on common 3D platforms. For this purpose, we present several approaches exploiting constraints on the data representation for improving the streaming and rendering performance, and camera movement constraints in order to provide scalable navigation methods for interactive exploration of complex 3D environments. Furthermore, we study visualization and interaction techniques to improve the exploration and understanding of complex 3D models by exploiting guided motion control techniques to aid the user in discovering contextual information while avoiding cluttering the visualization. We demonstrate the effectiveness and scalability of our approaches both in large screen museum installations and in mobile devices, by performing interactive exploration of models ranging from 9Mtriangles to 940Mtriangles
    corecore