188 research outputs found
Semantic Restructuring of Natural Language Image Captions to Enhance Image Retrieval
semantic, multimedia,information retrievalsemantic, multimedia,information retrievalsemantic, multimedia,information retrievalsemantic, multimedia,information retrieva
Semantic multimedia modelling & interpretation for annotation
The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems
Semantic multimedia modelling & interpretation for search & retrieval
With the axiomatic revolutionary in the multimedia equip devices, culminated in the proverbial proliferation of the image and video data. Owing to this omnipresence and progression, these data become the part of our daily life. This devastating data production rate accompanies with a predicament of surpassing our potentials for acquiring this data. Perhaps one of the utmost prevailing problems of this digital era is an information plethora.
Until now, progressions in image and video retrieval research reached restrained success owed to its interpretation of an image and video in terms of primitive features. Humans generally access multimedia assets in terms of semantic concepts. The retrieval of digital images and videos is impeded by the semantic gap. The semantic gap is the discrepancy between a user’s high-level interpretation of an image and the information that can be extracted from an image’s physical properties. Content- based image and video retrieval systems are explicitly assailable to the semantic gap due to their dependence on low-level visual features for describing image and content. The semantic gap can be narrowed by including high-level features. High-level descriptions of images and videos are more proficient of apprehending the semantic meaning of image and video content.
It is generally understood that the problem of image and video retrieval is still far from being solved. This thesis proposes an approach for intelligent multimedia semantic extraction for search and retrieval. This thesis intends to bridge the gap between the visual features and semantics. This thesis proposes a Semantic query Interpreter for the images and the videos. The proposed Semantic Query Interpreter will select the pertinent terms from the user query and analyse it lexically and semantically. The proposed SQI reduces the semantic as well as the vocabulary gap between the users and the machine. This thesis also explored a novel ranking strategy for image search and retrieval. SemRank is the novel system that will incorporate the Semantic Intensity (SI) in exploring the semantic relevancy between the user query and the available data. The novel Semantic Intensity captures the concept dominancy factor of an image. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other. The SemRank will rank the retrieved images on the basis of Semantic Intensity.
The investigations are made on the LabelMe image and LabelMe video dataset. Experiments show that the proposed approach is successful in bridging the semantic gap. The experiments reveal that our proposed system outperforms the traditional image retrieval systems
Semantic multimedia analysis using knowledge and context
PhDThe difficulty of semantic multimedia analysis can be attributed to the
extended diversity in form and appearance exhibited by the majority of
semantic concepts and the difficulty to express them using a finite number
of patterns. In meeting this challenge there has been a scientific debate
on whether the problem should be addressed from the perspective of using
overwhelming amounts of training data to capture all possible instantiations
of a concept, or from the perspective of using explicit knowledge about
the concepts’ relations to infer their presence. In this thesis we address
three problems of pattern recognition and propose solutions that combine
the knowledge extracted implicitly from training data with the knowledge
provided explicitly in structured form. First, we propose a BNs modeling
approach that defines a conceptual space where both domain related evi-
dence and evidence derived from content analysis can be jointly considered
to support or disprove a hypothesis. The use of this space leads to sig-
nificant gains in performance compared to analysis methods that can not
handle combined knowledge. Then, we present an unsupervised method
that exploits the collective nature of social media to automatically obtain
large amounts of annotated image regions. By proving that the quality of
the obtained samples can be almost as good as manually annotated images
when working with large datasets, we significantly contribute towards scal-
able object detection. Finally, we introduce a method that treats images,
visual features and tags as the three observable variables of an aspect model
and extracts a set of latent topics that incorporates the semantics of both
visual and tag information space. By showing that the cross-modal depen-
dencies of tagged images can be exploited to increase the semantic capacity
of the resulting space, we advocate the use of all existing information facets
in the semantic analysis of social media
The Role of Visual Rhetoric in Semantic Multimedia: Strategies for Decision Making in Times of Crisis
As semantic multimedia is approaching mainstream, even the great improvements that can be seen in its classic schools, like the data mining inspired Information Retrieval based on metadata analysis, or Computer Vision, might not be enough. We identify a new group that gains traction in the semantic multimedia community and which uses as starting point developments from psychology and visual communication. For the purposes of this article we restrict our domain to visual rhetoric as we consider it to yield the biggest potential for future developments. Living in times when the periods between crises seem to be shorter and shorter, we look at how developments in semantic multimedia can be used for predicting and overcoming crises. We analyze at least 2 aspects related to this: using information visualization to understand the evolution of crises and creating multi-layered semantic multimedia technologies that can easily be adapted to use a variety of sources and solve problems from different domains. In both cases we show how techniques inspired by visual rhetoric (information linking, framing, composition) in conjunction with named entity recognition offer a lot of benefits. The section related to multi-layered semantic multimedia technologies also draws on the lessons learned while designing a prototype application aimed at improving tourism decision making process. The article ends with a discussion on evaluation methods for multi-layered semantic technologies applications. We look at how to evaluate them on both levels: mechanisms (information linking versus raw named entity recognition when generating visuals, for example), and decision making strategies (Do such systems actually solve real problems related to crises, create jobs or at least can they be repurposed to solve other problems than the one with which we have started?)
A Novel Approach to Multimedia Ontology Engineering for Automated Reasoning over Audiovisual LOD Datasets
Multimedia reasoning, which is suitable for, among others, multimedia content
analysis and high-level video scene interpretation, relies on the formal and
comprehensive conceptualization of the represented knowledge domain. However,
most multimedia ontologies are not exhaustive in terms of role definitions, and
do not incorporate complex role inclusions and role interdependencies. In fact,
most multimedia ontologies do not have a role box at all, and implement only a
basic subset of the available logical constructors. Consequently, their
application in multimedia reasoning is limited. To address the above issues,
VidOnt, the very first multimedia ontology with SROIQ(D) expressivity and a
DL-safe ruleset has been introduced for next-generation multimedia reasoning.
In contrast to the common practice, the formal grounding has been set in one of
the most expressive description logics, and the ontology validated with
industry-leading reasoners, namely HermiT and FaCT++. This paper also presents
best practices for developing multimedia ontologies, based on my ontology
engineering approach
- …