1,798 research outputs found
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a âpiece â of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project
The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Dublin City University video track experiments for TREC 2001
Dublin City University participated in the interactive search task and Shot Boundary Detection task* of the TREC Video Track. In the interactive search task experiment thirty people used three different digital video browsers to find video segments matching the given topics. Each user was under a time constraint of six minutes for each topic assigned to them. The purpose of this experiment was to compare video browsers and so a method was developed for combining independent usersâ results for a topic into one set of results. Collated results based on thirty users are available herein though individual usersâ and browsersâ results are currently unavailable for comparison. Our purpose in participating in this TREC track was to create the ground truth within the TREC framework, which will allow us to do direct browser performance comparisons
Recommended from our members
MAC-REALM: A video content feature extraction and modelling framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the âdata delugeâ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to âbridgeâ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM
Classifying Cinematographic Shot Types
3noIn film-making, the distance from the camera to the subject greatly effects the narrative power of a shot. By the alternate use of Long shots, Medium and Close-ups the director is able to provide emphasis on key passages of the filmed scene. In this work we investigate five different inherent characteristics of single shots which contain indirect information about camera distance, without the need to recover the 3D structure of the scene. Specifically, 2D scene geometric composition, frame colour intensity properties, motion distribution, spectral amplitude and shot content are considered for classifying shots into three main categories. In the experimental phase, we demonstrate the validity of the framework and effectiveness of the proposed descriptors by classifying a significant dataset of movie shots using C4.5 Decision Trees and Support Vector Machines. After comparing the performance of the statistical classifiers using the combined descriptor set, we test the ability of each single feature in distinguishing shot types.Published on-line Nov. 2011; Print publication Jan. 2013partially_openpartially_openCanini L.; Benini S.; Leonardi R.Canini, Luca; Benini, Sergio; Leonardi, Riccard
Recommended from our members
Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation.
Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications.
In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then,
iv
objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation
Construction de mosaïques de super-résolution à partir de la vidéo de basse résolution. Application au résumé vidéo et la dissimulation d'erreurs de transmission.
La numĂ©risation des vidĂ©os existantes ainsi que le dĂ©veloppement explosif des services multimĂ©dia par des rĂ©seaux comme la diffusion de la tĂ©lĂ©vision numĂ©rique ou les communications mobiles ont produit une Ă©norme quantitĂ© de vidĂ©os compressĂ©es. Ceci nĂ©cessite des outils dâindexation et de navigation efficaces, mais une indexation avant lâencodage nâest pas habituelle. Lâapproche courante est le dĂ©codage complet des ces vidĂ©os pour ensuite crĂ©er des indexes. Ceci est trĂšs coĂ»teux et par consĂ©quent non rĂ©alisable en temps rĂ©el. De plus, des informations importantes comme le mouvement, perdus lors du dĂ©codage, sont reestimĂ©es bien que dĂ©jĂ prĂ©sentes dans le flux comprimĂ©. Notre but dans cette thĂšse est donc la rĂ©utilisation des donnĂ©es dĂ©jĂ prĂ©sents dans le flux comprimĂ© MPEG pour lâindexation et la navigation rapide. Plus prĂ©cisĂ©ment, nous extrayons des coefficients DC et des vecteurs de mouvement. Dans le cadre de cette thĂšse, nous nous sommes en particulier intĂ©ressĂ©s Ă la construction de mosaĂŻques Ă partir des images DC extraites des images I. Une mosaĂŻque est construite par recalage et fusion de toutes les images dâune sĂ©quence vidĂ©o dans un seul systĂšme de coordonnĂ©es. Ce dernier est en gĂ©nĂ©ral alignĂ© avec une des images de la sĂ©quence : lâimage de rĂ©fĂ©rence. Il en rĂ©sulte une seule image qui donne une vue globale de la sĂ©quence. Ainsi, nous proposons dans cette thĂšse un systĂšme complet pour la construction des mosaĂŻques Ă partir du flux MPEG-1/2 qui tient compte de diffĂ©rentes problĂšmes apparaissant dans des sĂ©quences vidĂ©o rĂ©eles, comme par exemple des objets en mouvment ou des changements dâĂ©clairage. Une tĂąche essentielle pour la construction dâune mosaĂŻque est lâestimation de mouvement entre chaque image de la sĂ©quence et lâimage de rĂ©fĂ©rence. Notre mĂ©thode se base sur une estimation robuste du mouvement global de la camĂ©ra Ă partir des vecteurs de mouvement des images P. Cependant, le mouvement global de la camĂ©ra estimĂ© pour une image P peut ĂȘtre incorrect car il dĂ©pend fortement de la prĂ©cision des vecteurs encodĂ©s. Nous dĂ©tectons les images P concernĂ©es en tenant compte des coefficients DC de lâerreur encodĂ©e associĂ©e et proposons deux mĂ©thodes pour corriger ces mouvements. UnemosaĂŻque construite Ă partir des images DC a une rĂ©solution trĂšs faible et souffre des effets dâaliasing dus Ă la nature des images DC. Afin dâaugmenter sa rĂ©solution et dâamĂ©liorer sa qualitĂ© visuelle, nous appliquons une mĂ©thode de super-rĂ©solution basĂ©e sur des rĂ©tro-projections itĂ©ratives. Les mĂ©thodes de super-rĂ©solution sont Ă©galement basĂ©es sur le recalage et la fusion des images dâune sĂ©quence vidĂ©o, mais sont accompagnĂ©es dâune restauration dâimage. Dans ce cadre, nous avons dĂ©veloppĂ© une nouvellemĂ©thode dâestimation de flou dĂ» au mouvement de la camĂ©ra ainsi quâune mĂ©thode correspondante de restauration spectrale. La restauration spectrale permet de traiter le flou globalement, mais, dans le cas des obvi jets ayant un mouvement indĂ©pendant du mouvement de la camĂ©ra, des flous locaux apparaissent. Câest pourquoi, nous proposons un nouvel algorithme de super-rĂ©solution dĂ©rivĂ© de la restauration spatiale itĂ©rative de Van Cittert et Jansson permettant de restaurer des flous locaux. En nous basant sur une segmentation dâobjets en mouvement, nous restaurons sĂ©parĂ©ment lamosaĂŻque dâarriĂšre-plan et les objets de lâavant-plan. Nous avons adaptĂ© notre mĂ©thode dâestimation de flou en consĂ©quence. Dans une premier temps, nous avons appliquĂ© notre mĂ©thode Ă la construction de rĂ©sumĂ© vidĂ©o avec pour lâobjectif la navigation rapide par mosaĂŻques dans la vidĂ©o compressĂ©e. Puis, nous Ă©tablissions comment la rĂ©utilisation des rĂ©sultats intermĂ©diaires sert Ă dâautres tĂąches dâindexation, notamment Ă la dĂ©tection de changement de plan pour les images I et Ă la caractĂ©risation dumouvement de la camĂ©ra. Enfin, nous avons explorĂ© le domaine de la rĂ©cupĂ©ration des erreurs de transmission. Notre approche consiste en construire une mosaĂŻque lors du dĂ©codage dâun plan ; en cas de perte de donnĂ©es, lâinformation manquante peut ĂȘtre dissimulĂ©e grace Ă cette mosaĂŻque
Detection and Generalization of Spatio-temporal Trajectories for Motion Imagery
In today\u27s world of vast information availability users often confront large unorganized amounts of data with limited tools for managing them. Motion imagery datasets have become increasingly popular means for exposing and disseminating information. Commonly, moving objects are of primary interest in modeling such datasets. Users may require different levels of detail mainly for visualization and further processing purposes according to the application at hand. In this thesis we exploit the geometric attributes of objects for dataset summarization by using a series of image processing and neural network tools. In order to form data summaries we select representative time instances through the segmentation of an object\u27s spatio-temporal trajectory lines. High movement variation instances are selected through a new hybrid self-organizing map (SOM) technique to describe a single spatio-temporal trajectory. Multiple objects move in diverse yet classifiable patterns. In order to group corresponding trajectories we utilize an abstraction mechanism that investigates a vague moving relevance between the data in space and time. Thus, we introduce the spatio-temporal neighborhood unit as a variable generalization surface. By altering the unit\u27s dimensions, scaled generalization is accomplished. Common complications in tracking applications that include occlusion, noise, information gaps and unconnected segments of data sequences are addressed through the hybrid-SOM analysis. Nevertheless, entangled data sequences where no information on which data entry belongs to each corresponding trajectory are frequently evident. A multidimensional classification technique that combines geometric and backpropagation neural network implementation is used to distinguish between trajectory data. Further more, modeling and summarization of two-dimensional phenomena evolving in time brings forward the novel concept of spatio-temporal helixes as compact event representations. The phenomena models are comprised of SOM movement nodes (spines) and cardinality shape-change descriptors (prongs). While we focus on the analysis of MI datasets, the framework can be generalized to function with other types of spatio-temporal datasets. Multiple scale generalization is allowed in a dynamic significance-based scale rather than a constant one. The constructed summaries are not just a visualization product but they support further processing for metadata creation, indexing, and querying. Experimentation, comparisons and error estimations for each technique support the analyses discussed
- âŠ