1,286 research outputs found

    Semantic multimedia modelling & interpretation for annotation

    Get PDF
    The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    A semantic concept for the mapping of low-level analysis data to high-level scene descriptions

    Get PDF
    Zusammen mit dem wachsenden Bedarf an Sicherheit wird eine zunehmende Menge an Überwachungsinhalten geschaffen. Um eine schnelle und zuverlässige Suche in den Aufnahmen hunderter oder tausender in einer einzelnenEinrichtung installierten Überwachungssensoren zu ermöglichen, istdie Indizierung dieses Inhalts im Voraus unentbehrlich. Zu diesem Zweckermöglicht das Konzept des Smart Indexing & Retrieval (SIR) durch dieErzeugung von high-level Metadaten kosteneffiziente Suchen. Da es immerschwieriger wird, diese Daten manuell mit annehmbarem Zeit- und Kostenaufwandzu generieren, muss die Erzeugung dieser Metadaten auf Basis vonlow-level Analysedaten automatisch erfolgen.Während bisherige Ansätze stark domänenabhängig sind, wird in dieserArbeit ein generisches Konzept für die Abbildung der Ergebnisse von lowlevelAnalysedaten auf semantische Szenenbeschreibungen präsentiert. Diekonstituierenden Elemente dieses Ansatzes und die ihnen zugrunde liegendenBegriffe werden vorgestellt, und eine Einführung in ihre Anwendungwird gegeben. Der Hauptbeitrag des präsentierten Ansatzes sind dessen Allgemeingültigkeit und die frühe Stufe, auf der der Schritt von der low-levelauf die high-level Repräsentation vorgenommen wird. Dieses Schließen in derMetadatendomäne wird in kleinen Zeitfenstern durchgeführt, während dasSchließen auf komplexeren Szenen in der semantischen Domäne ausgeführtwird. Durch die Verwendung dieses Ansatzes ist sogar eine unbeaufsichtigteSelbstbewertung der Analyseergebnisse möglich

    Event Detection and Modelling for Security Application

    Get PDF
    PhD thesisThis thesis focuses on the design and implementation of a novel security domain surveillance system framework that incorporates multimodal information sources to assist the task of event detection from video and social media sources. The comprehensive framework consists of four modules including Data Source, Content Extraction, Parsing and Semantic Knowledge. The security domain ontology conceptual model is proposed for event representation and tailored in conformity with elementary aspects of event description. The adaptation of DOLCE foundational ontology promotes flexibility for heterogeneous ontologies to interoperate. The proposed mapping method using eXtensible Stylesheet Language Transformation (XSLT) stylesheet approach is presented to allow ontology enrichment and instance population to be executed efficiently. The dataset for visual semantic analysis utilizes video footage of 2011 London Riots obtained from Scotland Yard. The concepts person, face, police, car, fire, running, kicking and throwing are chosen to be analysed. The visual semantic analysis results demonstrate successful persons, actions and events detection in the video footage of riot events. For social semantic analysis, a collection of tweets from twitter channels that was actively reporting during the 2011 London Riots was compiled to create a Twitter corpus. The annotated data are mapped in the ontology based on six concepts: token, location, organization, sentence, verb, and noun. Several keywords related to the event that has been presented in the visual and social media sources are chosen to examine the correlation between both sources and to draw supplementary information regarding the event. The chosen keywords describe actions running, throwing, and kicking; activity attack, smash and loot; event fire; and location Hackney and Croydon. An experiment in respect to concept-noun relations are also been executed. The ontology-based visual and social media analysis yields a promising result in analysing long content surveillance videos and lengthy text corpus of social media user-generated content. Adopting ontology-based approach, the proposed novel security domain surveillance system framework enables a large amount of visual and social media data to be analysed systematically and automatically, and promotes a better method for event detection and understanding

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Semantic multimedia modelling & interpretation for search & retrieval

    Get PDF
    With the axiomatic revolutionary in the multimedia equip devices, culminated in the proverbial proliferation of the image and video data. Owing to this omnipresence and progression, these data become the part of our daily life. This devastating data production rate accompanies with a predicament of surpassing our potentials for acquiring this data. Perhaps one of the utmost prevailing problems of this digital era is an information plethora. Until now, progressions in image and video retrieval research reached restrained success owed to its interpretation of an image and video in terms of primitive features. Humans generally access multimedia assets in terms of semantic concepts. The retrieval of digital images and videos is impeded by the semantic gap. The semantic gap is the discrepancy between a user’s high-level interpretation of an image and the information that can be extracted from an image’s physical properties. Content- based image and video retrieval systems are explicitly assailable to the semantic gap due to their dependence on low-level visual features for describing image and content. The semantic gap can be narrowed by including high-level features. High-level descriptions of images and videos are more proficient of apprehending the semantic meaning of image and video content. It is generally understood that the problem of image and video retrieval is still far from being solved. This thesis proposes an approach for intelligent multimedia semantic extraction for search and retrieval. This thesis intends to bridge the gap between the visual features and semantics. This thesis proposes a Semantic query Interpreter for the images and the videos. The proposed Semantic Query Interpreter will select the pertinent terms from the user query and analyse it lexically and semantically. The proposed SQI reduces the semantic as well as the vocabulary gap between the users and the machine. This thesis also explored a novel ranking strategy for image search and retrieval. SemRank is the novel system that will incorporate the Semantic Intensity (SI) in exploring the semantic relevancy between the user query and the available data. The novel Semantic Intensity captures the concept dominancy factor of an image. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other. The SemRank will rank the retrieved images on the basis of Semantic Intensity. The investigations are made on the LabelMe image and LabelMe video dataset. Experiments show that the proposed approach is successful in bridging the semantic gap. The experiments reveal that our proposed system outperforms the traditional image retrieval systems

    Recent Developments in Video Surveillance

    Get PDF
    With surveillance cameras installed everywhere and continuously streaming thousands of hours of video, how can that huge amount of data be analyzed or even be useful? Is it possible to search those countless hours of videos for subjects or events of interest? Shouldn’t the presence of a car stopped at a railroad crossing trigger an alarm system to prevent a potential accident? In the chapters selected for this book, experts in video surveillance provide answers to these questions and other interesting problems, skillfully blending research experience with practical real life applications. Academic researchers will find a reliable compilation of relevant literature in addition to pointers to current advances in the field. Industry practitioners will find useful hints about state-of-the-art applications. The book also provides directions for open problems where further advances can be pursued

    Describing Human Activities in Video Streams

    Get PDF
    • …
    corecore