204 research outputs found
Video Indexing and Retrieval Techniques Using Novel Approaches to Video Segmentation, Characterization, and Similarity Matching
Multimedia applications are rapidly spread at an ever-increasing rate introducing a number of challenging problems at the hands of the research community, The most significant and influential problem, among them, is the effective access to stored data. In spite of the popularity of keyword-based search technique in alphanumeric databases, it is inadequate for use with multimedia data due to their unstructured nature. On the other hand, a number of content-based access techniques have been developed in the context of image indexing and retrieval; meanwhile video retrieval systems start to gain wide attention, This work proposes a number of techniques constituting a fully content-based system for retrieving video data. These techniques are primarily targeting the efficiency, reliability, scalability, extensibility, and effectiveness requirements of such applications. First, an abstract representation of the video stream, known as the DC sequence, is extracted. Second, to deal with the problem of video segmentation, an efficient neural network model is introduced. The novel use of the neural network improves the reliability while the efficiency is achieved through the instantaneous use of the recall phase to identify shot boundaries. Third, the problem of key frames extraction is addressed using two efficient algorithms that adapt their selection decisions based on the amount of activity found in each video shot enabling the selection of a near optimal expressive set of key frames. Fourth, the developed system employs an indexing scheme that supports two low-level features, color and texture, to represent video data, Finally, we propose, in the retrieval stage, a novel model for performing video data matching task that integrates a number of human-based similarity factors. All our software implementations are in Java, which enables it to be used across heterogeneous platforms. The retrieval system performance has been evaluated yielding a very good retrieval rate and accuracy, which demonstrate the effectiveness of the developed system
Audiovisual processing for sports-video summarisation technology
In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description âfield-sportsâ. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable
An object-based approach to retrieval of image and video content
Promising new directions have been opened up for content-based visual retrieval in recent years. Object-based retrieval which allows users to manipulate video objects as part of their searching and browsing interaction, is one of these. It is the purpose of this thesis to constitute itself as a part of a larger stream of research that investigates visual objects as a possible approach to advancing the use of semantics in content-based visual retrieval.
The notion of using objects in video retrieval has been seen as desirable for some years, but only very recently has technology started to allow even very basic object-location functions on video. The main hurdles to greater use of objects in video retrieval are the overhead of
object segmentation on large amounts of video and the issue of whether objects can actually be used efficiently for multimedia retrieval. Despite this, there are already some examples of work which supports retrieval based on video objects.
This thesis investigates an object-based approach to content-based visual retrieval. The main research contributions of this work are a study of shot boundary detection on compressed domain video where a fast detection approach is proposed and evaluated, and a study on the use of objects in interactive image retrieval. An object-based retrieval framework is developed in order to investigate object-based retrieval on a corpus of natural image
and video. This framework contains the entire processing chain required to analyse, index and interactively retrieve images and video via object-to-object matching. The experimental results indicate that object-based searching consistently outperforms image-based search using low-level features. This result goes some way towards validating the approach of allowing users to select objects as a basis for searching video archives when the information need dictates it as appropriate
Construction de mosaïques de super-résolution à partir de la vidéo de basse résolution. Application au résumé vidéo et la dissimulation d'erreurs de transmission.
La numĂ©risation des vidĂ©os existantes ainsi que le dĂ©veloppement explosif des services multimĂ©dia par des rĂ©seaux comme la diffusion de la tĂ©lĂ©vision numĂ©rique ou les communications mobiles ont produit une Ă©norme quantitĂ© de vidĂ©os compressĂ©es. Ceci nĂ©cessite des outils dâindexation et de navigation efficaces, mais une indexation avant lâencodage nâest pas habituelle. Lâapproche courante est le dĂ©codage complet des ces vidĂ©os pour ensuite crĂ©er des indexes. Ceci est trĂšs coĂ»teux et par consĂ©quent non rĂ©alisable en temps rĂ©el. De plus, des informations importantes comme le mouvement, perdus lors du dĂ©codage, sont reestimĂ©es bien que dĂ©jĂ prĂ©sentes dans le flux comprimĂ©. Notre but dans cette thĂšse est donc la rĂ©utilisation des donnĂ©es dĂ©jĂ prĂ©sents dans le flux comprimĂ© MPEG pour lâindexation et la navigation rapide. Plus prĂ©cisĂ©ment, nous extrayons des coefficients DC et des vecteurs de mouvement. Dans le cadre de cette thĂšse, nous nous sommes en particulier intĂ©ressĂ©s Ă la construction de mosaĂŻques Ă partir des images DC extraites des images I. Une mosaĂŻque est construite par recalage et fusion de toutes les images dâune sĂ©quence vidĂ©o dans un seul systĂšme de coordonnĂ©es. Ce dernier est en gĂ©nĂ©ral alignĂ© avec une des images de la sĂ©quence : lâimage de rĂ©fĂ©rence. Il en rĂ©sulte une seule image qui donne une vue globale de la sĂ©quence. Ainsi, nous proposons dans cette thĂšse un systĂšme complet pour la construction des mosaĂŻques Ă partir du flux MPEG-1/2 qui tient compte de diffĂ©rentes problĂšmes apparaissant dans des sĂ©quences vidĂ©o rĂ©eles, comme par exemple des objets en mouvment ou des changements dâĂ©clairage. Une tĂąche essentielle pour la construction dâune mosaĂŻque est lâestimation de mouvement entre chaque image de la sĂ©quence et lâimage de rĂ©fĂ©rence. Notre mĂ©thode se base sur une estimation robuste du mouvement global de la camĂ©ra Ă partir des vecteurs de mouvement des images P. Cependant, le mouvement global de la camĂ©ra estimĂ© pour une image P peut ĂȘtre incorrect car il dĂ©pend fortement de la prĂ©cision des vecteurs encodĂ©s. Nous dĂ©tectons les images P concernĂ©es en tenant compte des coefficients DC de lâerreur encodĂ©e associĂ©e et proposons deux mĂ©thodes pour corriger ces mouvements. UnemosaĂŻque construite Ă partir des images DC a une rĂ©solution trĂšs faible et souffre des effets dâaliasing dus Ă la nature des images DC. Afin dâaugmenter sa rĂ©solution et dâamĂ©liorer sa qualitĂ© visuelle, nous appliquons une mĂ©thode de super-rĂ©solution basĂ©e sur des rĂ©tro-projections itĂ©ratives. Les mĂ©thodes de super-rĂ©solution sont Ă©galement basĂ©es sur le recalage et la fusion des images dâune sĂ©quence vidĂ©o, mais sont accompagnĂ©es dâune restauration dâimage. Dans ce cadre, nous avons dĂ©veloppĂ© une nouvellemĂ©thode dâestimation de flou dĂ» au mouvement de la camĂ©ra ainsi quâune mĂ©thode correspondante de restauration spectrale. La restauration spectrale permet de traiter le flou globalement, mais, dans le cas des obvi jets ayant un mouvement indĂ©pendant du mouvement de la camĂ©ra, des flous locaux apparaissent. Câest pourquoi, nous proposons un nouvel algorithme de super-rĂ©solution dĂ©rivĂ© de la restauration spatiale itĂ©rative de Van Cittert et Jansson permettant de restaurer des flous locaux. En nous basant sur une segmentation dâobjets en mouvement, nous restaurons sĂ©parĂ©ment lamosaĂŻque dâarriĂšre-plan et les objets de lâavant-plan. Nous avons adaptĂ© notre mĂ©thode dâestimation de flou en consĂ©quence. Dans une premier temps, nous avons appliquĂ© notre mĂ©thode Ă la construction de rĂ©sumĂ© vidĂ©o avec pour lâobjectif la navigation rapide par mosaĂŻques dans la vidĂ©o compressĂ©e. Puis, nous Ă©tablissions comment la rĂ©utilisation des rĂ©sultats intermĂ©diaires sert Ă dâautres tĂąches dâindexation, notamment Ă la dĂ©tection de changement de plan pour les images I et Ă la caractĂ©risation dumouvement de la camĂ©ra. Enfin, nous avons explorĂ© le domaine de la rĂ©cupĂ©ration des erreurs de transmission. Notre approche consiste en construire une mosaĂŻque lors du dĂ©codage dâun plan ; en cas de perte de donnĂ©es, lâinformation manquante peut ĂȘtre dissimulĂ©e grace Ă cette mosaĂŻque
A video summarisation system for post-production
Post-production facilities deal with large amounts of digital video, which presents difficulties when tracking, managing and searching this material. Recent research work in image and video analysis promises to offer help in these tasks, but there is a gap between what these systems can provide and what users actually need. In particular the popular research models for indexing and retrieving visual data do not fit well with how users actually work. In this thesis we explore how image and video analysis can be applied to an online video collection to assist users in reviewing and searching for material faster, rather than purporting to do it for them.
We introduce a framework for automatically generating static 2-dimen- sional storyboards from video sequences. The storyboard consists of a series of frames, one for each shot in the sequence, showing the principal objects and motions of the shot. The storyboards are rendered as vector images in a familiar comic book style, allowing them to be quickly viewed and understood. The process consists of three distinct steps: shot-change detection, object segmentation, and presentation.
The nature of the video material encountered in a post-production fa- cility is quite different from other material such as television programmes. Video sequences such as commercials and music videos are highly dy- namic with very short shots, rapid transitions and ambiguous edits. Video is often heavily manipulated, causing difficulties for many video processing techniques.
We study the performance of a variety of published shot-change de- tection algorithms on the type of highly dynamic video typically encoun- tered in post-production work. Finding their performance disappointing, we develop a novel algorithm for detecting cuts and fades that operates directly on Motion-JPEG compressed video, exploiting the DCT coeffi- cients to save computation. The algorithm shows superior performance on highly dynamic material while performing comparably to previous algorithms on other material
Multimedia
The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications
Novel Methods and Algorithms for Presenting 3D Scenes
In recent years, improvements in the acquisition and creation of 3D models gave rise to
an increasing availability of 3D content and to a widening of the audience such content
is created for, which brought into focus the need for effective ways to visualize and
interact with it.
Until recently, the task of virtual inspection of a 3D object or navigation inside a 3D
scene was carried out by using human machine interaction (HMI) metaphors controlled
through mouse and keyboard events.
However, this interaction approach may be cumbersome for the general audience.
Furthermore, the inception and spread of touch-based mobile devices, such as smartphones
and tablets, redefined the interaction problem entirely, since neither mouse nor
keyboards are available anymore. The problem is made even worse by the fact that these
devices are typically lower power if compared to desktop machines, while high-quality
rendering is a computationally intensive task.
In this thesis, we present a series of novel methods for the easy presentation of 3D
content both when it is already available in a digitized form and when it must be acquired
from the real world by image-based techniques. In the first case, we propose
a method which takes as input the 3D scene of interest and an example video, and it
automatically produces a video of the input scene that resembles the given video example.
In other words, our algorithm allows the user to replicate an existing video, for
example, a video created by a professional animator, on a different 3D scene.
In the context of image-based techniques, exploiting the inherent spatial organization
of photographs taken for the 3D reconstruction of a scene, we propose an intuitive
interface for the smooth stereoscopic navigation of the acquired scene providing an immersive
experience without the need of a complete 3D reconstruction.
Finally, we propose an interactive framework for improving low-quality 3D reconstructions
obtained through image-based reconstruction algorithms. Using few strokes on
the input images, the user can specify high-level geometric hints to improve incomplete
or noisy reconstructions which are caused by various quite common conditions
often arising for objects such as buildings, streets and numerous other human-made
functional elements
- âŠ