10 research outputs found

    Associating characters with events in films

    Get PDF
    The work presented here combines the analysis of a film's audiovisual features with the analysis of an accompanying audio description. Specifically, we describe a technique for semantic-based indexing of feature films that associates character names with meaningful events. The technique fuses the results of event detection based on audiovisual features with the inferred on-screen presence of characters, based on an analysis of an audio description script. In an evaluation with 215 events from 11 films, the technique performed the character detection task with Precision = 93% and Recall = 71%. We then go on to show how novel access modes to film content are enabled by our analysis. The specific examples illustrated include video retrieval via a combination of event-type and character name and our first steps towards visualization of narrative and character interplay based on characters occurrence and co-occurrence in events

    Semantic Virtual Environments with Adaptive Multimodal Interfaces

    Get PDF
    We present a system for real-time configuration of multimodal interfaces to Virtual Environments (VE). The flexibility of our tool is supported by a semantics-based representation of VEs. Semantic descriptors are used to define interaction devices and virtual entities under control. We use portable (XML) descriptors to define the I/O channels of a variety of interaction devices. Semantic description of virtual objects turns them into reactive entities with whom the user can communicate in multiple ways. This article gives details on the semantics-based representation and presents some examples of multimodal interfaces created with our system, including gestures-based and PDA-based interfaces, amongst others

    Live Broadcasting of High Definition Audiovisual Content Using HDTV over Broadband IP Networks

    Get PDF
    The current paper focuses on validating an implementation of a state-of-the art audiovisual (AV) technologies setup for live broadcasting of cultural shows, via broadband Internet. The main objective of the work was to study, configure, and setup dedicated audio-video equipment for the processes of capturing, processing, and transmission of extended resolution and high fidelity AV content in order to increase realism and achieve maximum audience sensation. Internet2 and GEANT broadband telecommunication networks were selected as the most applicable technology to deliver such traffic workloads. Validation procedures were conducted in combination with metric-based quality of service (QoS) and quality of experience (QoE) evaluation experiments for the quantification and the perceptual interpretation of the quality achieved during content reproduction. The implemented system was successfully applied in real-world applications, such as the transmission of cultural events from Thessaloniki Concert Hall throughout Greece as well as the reproduction of Philadelphia Orchestra performances (USA) via Internet2 and GEANT backbones

    Maximum Energy Subsampling: A General Scheme For Multi-resolution Image Representation And Analysis

    Get PDF
    Image descriptors play an important role in image representation and analysis. Multi-resolution image descriptors can effectively characterize complex images and extract their hidden information. Wavelets descriptors have been widely used in multi-resolution image analysis. However, making the wavelets transform shift and rotation invariant produces redundancy and requires complex matching processes. As to other multi-resolution descriptors, they usually depend on other theories or information, such as filtering function, prior-domain knowledge, etc.; that not only increases the computation complexity, but also generates errors. We propose a novel multi-resolution scheme that is capable of transforming any kind of image descriptor into its multi-resolution structure with high computation accuracy and efficiency. Our multi-resolution scheme is based on sub-sampling an image into an odd-even image tree. Through applying image descriptors to the odd-even image tree, we get the relative multi-resolution image descriptors. Multi-resolution analysis is based on downsampling expansion with maximum energy extraction followed by upsampling reconstruction. Since the maximum energy usually retained in the lowest frequency coefficients; we do maximum energy extraction through keeping the lowest coefficients from each resolution level. Our multi-resolution scheme can analyze images recursively and effectively without introducing artifacts or changes to the original images, produce multi-resolution representations, obtain higher resolution images only using information from lower resolutions, compress data, filter noise, extract effective image features and be implemented in parallel processing

    Automatic event detection for tennis broadcasting

    Get PDF
    Within the image digital processing framework, this thesis is situated in the automatic content indexation field. Specifically during the project, different methods and techniques will be developed in order to achieve event detection for broadcasting tennis videos. Audiovisual indexation consists in the generation of descriptive tags based on the existing audiovisual data. All these tags are used to search the desired material in an efficient way. Televisions and other entities are looking for the improvement in operational efficiency. For this reason, the content indexation is a key factor and it is supposed to be automatic in the near future In some sports as football where the demand of video highlights is high, a big amount of economic and human resources are available to do this task. However, in other sports like tennis where the financial resources are not so high, automation becomes an important issue. The report starts describing the method to separate the useful frames. These frames are those that provide information, in other words, they appear when the camera is focused on the tennis court. The following step is looking for the court and players location, with the aim of translating their perspective coordinates into the real world coordinates. Depending on the distance travelled by the players, their movements and the duration of the shot, the implemented algorithm will be able to distinguish between different kind of shots, as aces, baseline rallies or net approaches. The code has been developed in Matlab programming language. The program has been tested with three tennis videos belonging to different surfaces: hard court, grass and clay. The results in terms of event detection and computing times will be detailed at the end of the report

    Overview of the MPEG-7 Standard and of Future Challenges for Visual Information Analysis

    No full text
    This paper presents an overview of the MPEG-7 standard: the Multimedia Content Description Interface. It focuses on visual information description including low-level visual Descriptors and Segment Description Schemes. The paper also discusses some challenges in visual information analysis that will have to be faced in the future to allow efficient MPEG-7-based applications

    Segmentation d'images et suivi d'objets en vidéos approches par estimation, sélection de caractéristiques et contours actifs

    Get PDF
    Cette thèse aborde deux problèmes parmi les plus importants et les plus complexes dans la vision artificielle, qui sont la segmentation d'images et le suivi d'objets dans les vidéos. Nous proposons plusieurs approches, traitant de ces deux problèmes, qui sont basées sur la modélisation variationnelle (contours actifs) et statistique. Ces approches ont pour but de surmonter différentes limites théoriques et pratiques (algorithmiques) de ces deux problèmes. En premier lieu, nous abordons le problème d'automatisation de la segmentation par contours actifs"ensembles de niveaux", et sa généralisation pour le cas de plusieurs régions. Pour cela, un modèle permettant d'estimer l'information de régions de manière automatique, et adaptative au contenu de l'image, est proposé. Ce modèle n'utilise aucune information a priori sur les régions, et traite également les images de couleur et de texture, avec un nombre arbitraire de régions. Nous introduisons ensuite une approche statistique pour estimer et intégrer la pertinence des caractéristiques et la sémantique dans la segmentation d'objets d'intérêt. En deuxième lieu, nous abordons le problème du suivi d'objets dans les vidéos en utilisant les contours actifs. Nous proposons pour cela deux modèles différents. Le premier suppose que les propriétés photométriques des objets suivis sont invariantes dans le temps, mais le modèle est capable de suivre des objets en présence de bruit, et au milieu de fonds de vidéos non-statiques et encombrés. Ceci est réalisé grâce à l'intégration de l'information de régions, de frontières et de formes des objets suivis. Le deuxième modèle permet de prendre en charge les variations photométriques des objets suivis, en utilisant un modèle statistique adaptatif à l'apparence de ces derniers. Finalement, nous proposons un nouveau modèle statistique, basé sur la Gaussienne généralisée, pour une représentation efficace de données bruitées et de grandes dimensions en segmentation. Ce modèle est utilisé pour assurer la robustesse de la segmentation des images de couleur contenant du bruit, ainsi que des objets en mouvement dans les vidéos (acquises par des caméras statiques) contenant de l'ombrage et/ou des changements soudains d'illumination
    corecore