5 research outputs found
Automatic Image Annotation for Semantic Image Retrieval
This paper addresses the challenge of automatic annotation of images for semantic image retrieval. In this research, we aim to identify visual features that are suitable for semantic annotation tasks. We propose an image classification system that combines MPEG-7 visual descriptors and support vector machines. The system is applied to annotate cityscape and landscape images. For this task, our analysis shows that the colour structure and edge histogram descriptors perform best, compared to a wide range of MPEG-7 visual descriptors. On a dataset of 7200 landscape and cityscape images representing real-life varied quality and resolution, the MPEG-7 colour structure descriptor and edge histogram descriptor achieve a classification rate of 82.8% and 84.6%, respectively. By combining these two features, we are able to achieve a classification rate of 89.7%. Our results demonstrate that combining salient features can significantly improve classification of images
Seeing the Intangible: Surveying Automatic High-Level Visual Understanding from Still Images
The field of Computer Vision (CV) was born with the single grand goal of
complete image understanding: providing a complete semantic interpretation of
an input image. What exactly this goal entails is not immediately
straightforward, but theoretical hierarchies of visual understanding point
towards a top level of full semantics, within which sits the most complex and
subjective information humans can detect from visual data. In particular,
non-concrete concepts including emotions, social values and ideologies seem to
be protagonists of this "high-level" visual semantic understanding. While such
"abstract concepts" are critical tools for image management and retrieval,
their automatic recognition is still a challenge, exactly because they rest at
the top of the "semantic pyramid": the well-known semantic gap problem is
worsened given their lack of unique perceptual referents, and their reliance on
more unspecific features than concrete concepts. Given that there seems to be
very scarce explicit work within CV on the task of abstract social concept
(ASC) detection, and that many recent works seem to discuss similar
non-concrete entities by using different terminology, in this survey we provide
a systematic review of CV work that explicitly or implicitly approaches the
problem of abstract (specifically social) concept detection from still images.
Specifically, this survey performs and provides: (1) A study and clustering of
high level visual understanding semantic elements from a multidisciplinary
perspective (computer science, visual studies, and cognitive perspectives); (2)
A study and clustering of high level visual understanding computer vision tasks
dealing with the identified semantic elements, so as to identify current CV
work that implicitly deals with AC detection
Trademark image retrieval by local features
The challenge of abstract trademark image retrieval as a test of machine vision algorithms has attracted considerable research interest in the past decade. Current
operational trademark retrieval systems involve manual annotation of the images
(the current ‘gold standard’). Accordingly, current systems require a substantial
amount of time and labour to access, and are therefore expensive to operate. This
thesis focuses on the development of algorithms that mimic aspects of human
visual perception in order to retrieve similar abstract trademark images
automatically. A significant category of trademark images are typically highly
stylised, comprising a collection of distinctive graphical elements that often
include geometric shapes. Therefore, in order to compare the similarity of such
images the principal aim of this research has been to develop a method for solving
the partial matching and shape perception problem.
There are few useful techniques for partial shape matching in the context of
trademark retrieval, because those existing techniques tend not to support multicomponent
retrieval. When this work was initiated most trademark image
retrieval systems represented images by means of global features, which are not
suited to solving the partial matching problem. Instead, the author has
investigated the use of local image features as a means to finding similarities
between trademark images that only partially match in terms of their subcomponents.
During the course of this work, it has been established that the
Harris and Chabat detectors could potentially perform sufficiently well to serve as
the basis for local feature extraction in trademark image retrieval. Early findings
in this investigation indicated that the well established SIFT (Scale Invariant
Feature Transform) local features, based on the Harris detector, could potentially
serve as an adequate underlying local representation for matching trademark
images.
There are few researchers who have used mechanisms based on human
perception for trademark image retrieval, implying that the shape representations
utilised in the past to solve this problem do not necessarily reflect the shapes
contained in these image, as characterised by human perception. In response, a
ii
practical approach to trademark image retrieval by perceptual grouping has been
developed based on defining meta-features that are calculated from the spatial
configurations of SIFT local image features. This new technique measures certain
visual properties of the appearance of images containing multiple graphical
elements and supports perceptual grouping by exploiting the non-accidental
properties of their configuration.
Our validation experiments indicated that we were indeed able to capture
and quantify the differences in the global arrangement of sub-components evident
when comparing stylised images in terms of their visual appearance properties.
Such visual appearance properties, measured using 17 of the proposed metafeatures,
include relative sub-component proximity, similarity, rotation and
symmetry. Similar work on meta-features, based on the above Gestalt proximity,
similarity, and simplicity groupings of local features, had not been reported in the
current computer vision literature at the time of undertaking this work.
We decided to adopted relevance feedback to allow the visual appearance
properties of relevant and non-relevant images returned in response to a query to
be determined by example. Since limited training data is available when
constructing a relevance classifier by means of user supplied relevance feedback,
the intrinsically non-parametric machine learning algorithm ID3 (Iterative
Dichotomiser 3) was selected to construct decision trees by means of dynamic
rule induction. We believe that the above approach to capturing high-level visual
concepts, encoded by means of meta-features specified by example through
relevance feedback and decision tree classification, to support flexible trademark
image retrieval and to be wholly novel.
The retrieval performance the above system was compared with two other
state-of-the-art image trademark retrieval systems: Artisan developed by Eakins
(Eakins et al., 1998) and a system developed by Jiang (Jiang et al., 2006). Using
relevance feedback, our system achieves higher average normalised precision
than either of the systems developed by Eakins’ or Jiang. However, while our
trademark image query and database set is based on an image dataset used by
Eakins, we employed different numbers of images. It was not possible to access to
the same query set and image database used in the evaluation of Jiang’s trademark
iii
image retrieval system evaluation. Despite these differences in evaluation
methodology, our approach would appear to have the potential to improve
retrieval effectiveness
Producing Informative Text Alternatives for Images
A picture may be worth a thousand words but what might those words be? How do we go about finding those words? Images are often used to convey information, supplement textual content, and/or add visual appeal to documents. Unless the user can see the image and properly interpret it, the user may not receive the same information. While containers exist for providing text alternatives in various types of electronic documents (including Web pages), they are rarely used. When they are used, the text alternatives are not informative. While guidance currently exists regarding which containers to use in order to provide text alternatives, there is little guidance available regarding what information to include in these containers and how to compose text alternatives. The purpose of this work is to establish a procedure for identifying the information being communicated within an image and provide guidance on how to produce informative text alternatives.
Based on related information in the areas of Web accessibility, library cataloguing, captioning and audio description, image retrieval and indexing, art description, and tactile representation, important information communicated by an image were identified and a procedure for producing informative text alternatives using that information was developed. Studies were conducted to determine the effectiveness of the procedure to identify important information about an image.
Study 1 determined the information identified about an image when the procedure was not available. It also suggested reasons why people would opt to not provide a text alternative for an image. Study 2 determined the information that people would identify when they were given the procedure and a set of questions to help identify information about an image. Study 3 determined the information people identified when they were required to consider all of the different types of information that may be important in an image. The results from these three studies were compared to determine the effectiveness of the procedure to identify important information about the image. Study 4 presented the information identified in the previous three studies to sighted and visually impaired users to evaluate the importance of such information. This study determined the quality of the information identified in the first three studies and the ability of the procedure to identify important information for a wide set of images.
The results of these studies showed that the procedure was effective in identifying a greater amount of important information than without the procedure. Additional guidance was also identified to further help people create informative and useful text alternatives. The studies also showed that the procedure could be applied by different user groups to a wide range of images. The procedure was submitted to the International Standards Organization to become a technical specification, which will be available to people around the world