123 research outputs found
Recommended from our members
MAC-REALM: A video content feature extraction and modelling framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the ‘data deluge’ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to ‘bridge’ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM
Video annotation for studying the brain in naturalistic settings
Aivojen tutkiminen luonnollisissa asetelmissa on viimeaikainen suunta aivotutkimuksessa. Perinteisesti aivotutkimuksessa on käytetty hyvin yksinkertaistettuja ja keinotekoisia ärsykkeitä, mutta viime aikoina on alettu tutkia ihmisaivoja yhä luonnollisimmissa asetelmissa. Näissä kokeissa on käytetty elokuvaa luonnollisena ärsykkeenä.
Elokuvan monimutkaisuudesta johtuen tarvitaan siitä yksinkertaistettu malli laskennallisen käsittely mahdollistamiseksi. Tämä malli tuotetaan annotoimalla; keräämällä elokuvan keskeisistä ärsykepiirteistä dataa tietorakenteen muodostamiseksi. Tätä dataa verrataan aivojen aikariippuvaiseen aktivaatioon etsittäessä mahdollisia korrelaatioita.
Kaikkia elokuvan ominaisuuksia ei pystytä annotoimaan automaattisesti; ihmiselle merkitykselliset ominaisuudet on annotoitava käsin, joka on joissain tapauksissa ongelmallista johtuen elokuvan käyttämistä useista viestintämuodoista. Ymmärrys näistä viestinnän muodoista auttaa analysoimaan ja annotoimaan elokuvia.
Elokuvaa Tulitikkutehtaan Tyttö (Aki Kaurismäki, 1990) käytettiin ärsykkeenä aivojen tutkimiseksi luonnollisissa asetelmissa. Kokeista saadun datan analysoinnin helpottamiseksi annotoitiin elokuvan keskeiset visuaaliset ärsykepiirteet. Tässä työssä tutkittiin annotointiin käytettävissä olevia eri lähestymistapoja ja teknologioita.
Annotointi auttaa informaation organisoinnissa, mistä syystä annotointia ilmestyy nykyään kaikkialla. Erilaisia annotaatiotyökaluja ja -teknologioita kehitetään jatkuvasti. Lisäksi videoanalyysimenetelmät ovat alkaneet mahdollistaa yhä merkityksellisemmän informaation automaattisen annotoinnin tulevaisuudessa.Studying the brain in naturalistic settings is a recent trend in neuroscience. Traditional brain imaging experiments have relied on using highly simplified and artificial stimuli, but recently efforts have been put into studying the human brain in conditions closer to real-life. The methodology used in these studies involve imitating naturalistic stimuli with a movie.
Because of the complexity of the naturalistic stimulus, a simplified model of it is needed to handle it computationally. This model is obtained by making annotations; collecting information of salient features of the movie to form a data structure. This data is compared with the brain activity evolving in time to search for possible correlations. All the features of a movie cannot be reliably annotated automatically: semantic features of a movie require manual annotations, which is in some occasions problematic due to the various cinematic techniques adopted. Understanding these methods helps analyzing and annotating movies.
The movie Match Factory Girl (Aki Kaurismäki, 1990) was used as a stimulus in studying the brain in naturalistic settings. To help the analysis of the acquired data the salient visual features of the movie were annotated. In this work existing annotation approaches and available technologies for annotation were reviewed.
Annotations help organizing information, therefore they are nowadays found everywhere. Different tools and technologies are being developed constantly. Furthermore, development of automatic video analysis methods are going to provide more meaningful annotations in the future
Handling temporal heterogeneous data for content-based management of large video collections
Video document retrieval is now an active part of the domain of multimedia retrieval. However, unlike for other media, the management of a collection of video documents adds the problem of efficiently handling an overwhelming volume of temporal data. Challenges include balancing efficient content modeling and storage against fast access at various levels. In this paper, we detail the framework we have built to accommodate our developments in content-based multimedia retrieval. We show that not only our framework facilitates the development of processing and indexing algorithms but it also opens the way to several other possibilities such as rapid interface prototyping or retrieval algorithm benchmarking. Here, we discuss our developments in relation to wider contexts such as MPEG-7 and the TREC Video Trac
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Adaptive video delivery using semantics
The diffusion of network appliances such as cellular phones, personal digital assistants and hand-held computers has created the need to personalize the way media content is delivered to the end user. Moreover, recent devices, such as digital radio receivers with graphics displays, and new applications, such as intelligent visual surveillance, require novel forms of video analysis for content adaptation and summarization. To cope with these challenges, we propose an automatic method for the extraction of semantics from video, and we present a framework that exploits these semantics in order to provide adaptive video delivery. First, an algorithm that relies on motion information to extract multiple semantic video objects is proposed. The algorithm operates in two stages. In the first stage, a statistical change detector produces the segmentation of moving objects from the background. This process is robust with regard to camera noise and does not need manual tuning along a sequence or for different sequences. In the second stage, feedbacks between an object partition and a region partition are used to track individual objects along the frames. These interactions allow us to cope with multiple, deformable objects, occlusions, splitting, appearance and disappearance of objects, and complex motion. Subsequently, semantics are used to prioritize visual data in order to improve the performance of adaptive video delivery. The idea behind this approach is to organize the content so that a particular network or device does not inhibit the main content message. Specifically, we propose two new video adaptation strategies. The first strategy combines semantic analysis with a traditional frame-based video encoder. Background simplifications resulting from this approach do not penalize overall quality at low bitrates. The second strategy uses metadata to efficiently encode the main content message. The metadata-based representation of object's shape and motion suffices to convey the meaning and action of a scene when the objects are familiar. The impact of different video adaptation strategies is then quantified with subjective experiments. We ask a panel of human observers to rate the quality of adapted video sequences on a normalized scale. From these results, we further derive an objective quality metric, the semantic peak signal-to-noise ratio (SPSNR), that accounts for different image areas and for their relevance to the observer in order to reflect the focus of attention of the human visual system. At last, we determine the adaptation strategy that provides maximum value for the end user by maximizing the SPSNR for given client resources at the time of delivery. By combining semantic video analysis and adaptive delivery, the solution presented in this dissertation permits the distribution of video in complex media environments and supports a large variety of content-based applications
- …