3,305 research outputs found
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm users’ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to ‘unannotated’ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ‘non-informative
visual words’ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
Living Knowledge
Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following
Video Data Visualization System: Semantic Classification And Personalization
We present in this paper an intelligent video data visualization tool, based
on semantic classification, for retrieving and exploring a large scale corpus
of videos. Our work is based on semantic classification resulting from semantic
analysis of video. The obtained classes will be projected in the visualization
space. The graph is represented by nodes and edges, the nodes are the keyframes
of video documents and the edges are the relation between documents and the
classes of documents. Finally, we construct the user's profile, based on the
interaction with the system, to render the system more adequate to its
references.Comment: graphic
Complete Semantics to empower Touristic Service Providers
The tourism industry has a significant impact on the world's economy,
contributes 10.2% of the world's gross domestic product in 2016. It becomes a
very competitive industry, where having a strong online presence is an
essential aspect for business success. To achieve this goal, the proper usage
of latest Web technologies, particularly schema.org annotations is crucial. In
this paper, we present our effort to improve the online visibility of touristic
service providers in the region of Tyrol, Austria, by creating and deploying a
substantial amount of semantic annotations according to schema.org, a widely
used vocabulary for structured data on the Web. We started our work from
Tourismusverband (TVB) Mayrhofen-Hippach and all touristic service providers in
the Mayrhofen-Hippach region and applied the same approach to other TVBs and
regions, as well as other use cases. The rationale for doing this is
straightforward. Having schema.org annotations enables search engines to
understand the content better, and provide better results for end users, as
well as enables various intelligent applications to utilize them. As a direct
consequence, the region of Tyrol and its touristic service increase their
online visibility and decrease the dependency on intermediaries, i.e. Online
Travel Agency (OTA).Comment: 18 pages, 6 figure
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Legal knowledge acquisition and multimedia applications
Search, retrieval, and management of multimedia contents are challenging tasks for users and researchers alike. The aim of e-sentencias Project is to develop a software-hardware system for the global management of the multimedia contents produced by the Spanish Civil Courts. We apply technologies such as the Semantic Web, ontologies, NLP techniques, audio-video segmentation and IR. The ultimate goal is to obtain an automatic classification of images and segments of the audiovisual records that, coupled with textual semantics, allows anefficient navigation and retrieval of judicial documents and additional legal sources
Cognitive visual tracking and camera control
Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision
A Semantic Web approach to ontology-based system: integrating, sharing and analysing IoT health and fitness data
With the rapid development of fitness industry, Internet of Things (IoT) technology is becoming one of the most popular trends for the health and fitness areas. IoT technologies have revolutionised the fitness and the sport industry by giving users the ability to monitor their health status and keep track of their training sessions. More and more sophisticated wearable devices, fitness trackers, smart watches and health mobile applications will appear in the near future. These systems do collect data non-stop from sensors and upload them to the Cloud. However, from a data-centric perspective the landscape of IoT fitness devices and wellness appliances is characterised by a plethora of representation and serialisation formats. The high heterogeneity of IoT data representations and the lack of common accepted standards, keep data isolated within each single system, preventing users and health professionals from having an integrated view of the various information collected. Moreover, in order to fully exploit the potential of the large amounts of data, it is also necessary to enable advanced analytics over it, thus achieving actionable knowledge. Therefore, due the above situation, the aim of this thesis project is to design and implement an ontology based system to (1) allow data interoperability among heterogeneous IoT fitness and wellness devices, (2) facilitate the integration and the sharing of information and (3) enable advanced analytics over the collected data (Cognitive Computing). The novelty of the proposed solution lies in exploiting Semantic Web technologies to formally describe the meaning of the data collected by the IoT devices and define a common communication strategy for information representation and exchange
- …