7 research outputs found
Trademark image retrieval by local features
The challenge of abstract trademark image retrieval as a test of machine vision algorithms has attracted considerable research interest in the past decade. Current
operational trademark retrieval systems involve manual annotation of the images
(the current âgold standardâ). Accordingly, current systems require a substantial
amount of time and labour to access, and are therefore expensive to operate. This
thesis focuses on the development of algorithms that mimic aspects of human
visual perception in order to retrieve similar abstract trademark images
automatically. A significant category of trademark images are typically highly
stylised, comprising a collection of distinctive graphical elements that often
include geometric shapes. Therefore, in order to compare the similarity of such
images the principal aim of this research has been to develop a method for solving
the partial matching and shape perception problem.
There are few useful techniques for partial shape matching in the context of
trademark retrieval, because those existing techniques tend not to support multicomponent
retrieval. When this work was initiated most trademark image
retrieval systems represented images by means of global features, which are not
suited to solving the partial matching problem. Instead, the author has
investigated the use of local image features as a means to finding similarities
between trademark images that only partially match in terms of their subcomponents.
During the course of this work, it has been established that the
Harris and Chabat detectors could potentially perform sufficiently well to serve as
the basis for local feature extraction in trademark image retrieval. Early findings
in this investigation indicated that the well established SIFT (Scale Invariant
Feature Transform) local features, based on the Harris detector, could potentially
serve as an adequate underlying local representation for matching trademark
images.
There are few researchers who have used mechanisms based on human
perception for trademark image retrieval, implying that the shape representations
utilised in the past to solve this problem do not necessarily reflect the shapes
contained in these image, as characterised by human perception. In response, a
ii
practical approach to trademark image retrieval by perceptual grouping has been
developed based on defining meta-features that are calculated from the spatial
configurations of SIFT local image features. This new technique measures certain
visual properties of the appearance of images containing multiple graphical
elements and supports perceptual grouping by exploiting the non-accidental
properties of their configuration.
Our validation experiments indicated that we were indeed able to capture
and quantify the differences in the global arrangement of sub-components evident
when comparing stylised images in terms of their visual appearance properties.
Such visual appearance properties, measured using 17 of the proposed metafeatures,
include relative sub-component proximity, similarity, rotation and
symmetry. Similar work on meta-features, based on the above Gestalt proximity,
similarity, and simplicity groupings of local features, had not been reported in the
current computer vision literature at the time of undertaking this work.
We decided to adopted relevance feedback to allow the visual appearance
properties of relevant and non-relevant images returned in response to a query to
be determined by example. Since limited training data is available when
constructing a relevance classifier by means of user supplied relevance feedback,
the intrinsically non-parametric machine learning algorithm ID3 (Iterative
Dichotomiser 3) was selected to construct decision trees by means of dynamic
rule induction. We believe that the above approach to capturing high-level visual
concepts, encoded by means of meta-features specified by example through
relevance feedback and decision tree classification, to support flexible trademark
image retrieval and to be wholly novel.
The retrieval performance the above system was compared with two other
state-of-the-art image trademark retrieval systems: Artisan developed by Eakins
(Eakins et al., 1998) and a system developed by Jiang (Jiang et al., 2006). Using
relevance feedback, our system achieves higher average normalised precision
than either of the systems developed by Eakinsâ or Jiang. However, while our
trademark image query and database set is based on an image dataset used by
Eakins, we employed different numbers of images. It was not possible to access to
the same query set and image database used in the evaluation of Jiangâs trademark
iii
image retrieval system evaluation. Despite these differences in evaluation
methodology, our approach would appear to have the potential to improve
retrieval effectiveness
Combining Disparate Information for Machine Learning.
This thesis considers information fusion for four different types of machine learning problems: anomaly detection, information retrieval, collaborative filtering and structure learning for time series, and focuses on a common theme -- the benefit to combining disparate information resulting in improved algorithm performance.
In this dissertation, several new algorithms and applications to real-world datasets are presented. In Chapter II, a novel approach called Pareto Depth Analysis (PDA) is proposed for combining different dissimilarity metrics for anomaly detection. PDA is applied to video-based anomaly detection of pedestrian trajectories. Following a similar idea, in Chapter III we propose to use a similar Pareto Front method for a multiple-query information retrieval problem when different queries represent different semantic concepts. Pareto Front information retrieval is applied to multiple query image retrieval. In Chapter IV, we extend a recently proposed collaborative retrieval approach to incorporate complementary social network information, an approach we call Social Collaborative Retrieval (SCR). SCR is applied to a music recommendation system that combines both user history and friendship network information to improve recall and weighted recall performance. In Chapter V, we propose a framework that combines time series data at different time scales and offsets for more accurate estimation of multiple precision matrices. We propose a general fused graphical lasso approach to jointly estimate these precision matrices. The framework is applied to modeling financial time series data.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108878/1/coolmark_1.pd
Interactive video retrieval using implicit user feedback.
PhDIn the recent years, the rapid development of digital technologies and the low
cost of recording media have led to a great increase in the availability of
multimedia content worldwide. This availability places the demand for the
development of advanced search engines. Traditionally, manual annotation of
video was one of the usual practices to support retrieval. However, the vast
amounts of multimedia content make such practices very expensive in terms of
human effort. At the same time, the availability of low cost wearable sensors
delivers a plethora of user-machine interaction data. Therefore, there is an
important challenge of exploiting implicit user feedback (such as user navigation
patterns and eye movements) during interactive multimedia retrieval sessions
with a view to improving video search engines. In this thesis, we focus on
automatically annotating video content by exploiting aggregated implicit
feedback of past users expressed as click-through data and gaze movements.
Towards this goal, we have conducted interactive video retrieval experiments, in
order to collect click-through and eye movement data in not strictly controlled
environments. First, we generate semantic relations between the multimedia
items by proposing a graph representation of aggregated past interaction data and
exploit them to generate recommendations, as well as to improve content-based
search. Then, we investigate the role of user gaze movements in interactive video
retrieval and propose a methodology for inferring user interest by employing
support vector machines and gaze movement-based features. Finally, we propose
an automatic video annotation framework, which combines query clustering into
topics by constructing gaze movement-driven random forests and temporally
enhanced dominant sets, as well as video shot classification for predicting the
relevance of viewed items with respect to a topic. The results show that
exploiting heterogeneous implicit feedback from past users is of added value for
future users of interactive video retrieval systems
Instance-Based Relevance Feedback for Image Retrieval
High retrieval precision in content-based image retrieval can be
attained by adopting relevance feedback mechanisms. These
mechanisms require that the user judges the quality of the results of
the query by marking all the retrieved images as being either
relevant or not. Then, the search engine exploits this information to
adapt the search to better meet userâs needs. At present, the vast
majority of proposed relevance feedback mechanisms are
formulated in terms of search model that has to be optimized. Such
an optimization involves the modification of some search
parameters so that the nearest neighbor of the query vector contains
the largest number of relevant images. In this paper, a different
approach to relevance feedback is proposed. After the user
provides the first feedback, following retrievals are not based on knn search, but on the computation of a relevance score for each
image of the database. This score is computed as a function of two
distances, namely the distance from the nearest non-relevant image
and the distance from the nearest relevant one. Images are then
ranked according to this score and the top k images are displayed.
Reported results on three image data sets show that the proposed
mechanism outperforms other state-of-the-art relevance feedback
mechanisms