9,508 research outputs found
Component-based Attention for Large-scale Trademark Retrieval
The demand for large-scale trademark retrieval (TR) systems has significantly
increased to combat the rise in international trademark infringement.
Unfortunately, the ranking accuracy of current approaches using either
hand-crafted or pre-trained deep convolution neural network (DCNN) features is
inadequate for large-scale deployments. We show in this paper that the ranking
accuracy of TR systems can be significantly improved by incorporating hard and
soft attention mechanisms, which direct attention to critical information such
as figurative elements and reduce attention given to distracting and
uninformative elements such as text and background. Our proposed approach
achieves state-of-the-art results on a challenging large-scale trademark
dataset.Comment: Fix typos related to authors' informatio
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Multimedia information technology and the annotation of video
The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning
Learning Test-time Data Augmentation for Image Retrieval with Reinforcement Learning
Off-the-shelf convolutional neural network features achieve outstanding
results in many image retrieval tasks. However, their invariance is pre-defined
by the network architecture and training data. Existing image retrieval
approaches require fine-tuning or modification of the pre-trained networks to
adapt to the variations in the target data. In contrast, our method enhances
the invariance of off-the-shelf features by aggregating features extracted from
images augmented with learned test-time augmentations. The optimal ensemble of
test-time augmentations is learned automatically through reinforcement
learning. Our training is time and resources efficient, and learns a diverse
test-time augmentations. Experiment results on trademark retrieval (METU
trademark dataset) and landmark retrieval (Oxford5k and Paris6k scene datasets)
tasks show the learned ensemble of transformations is effective and
transferable. We also achieve state-of-the-art MAP@100 results on the METU
trademark dataset
MTRNet: A Generic Scene Text Eraser
Text removal algorithms have been proposed for uni-lingual scripts with
regular shapes and layouts. However, to the best of our knowledge, a generic
text removal method which is able to remove all or user-specified text regions
regardless of font, script, language or shape is not available. Developing such
a generic text eraser for real scenes is a challenging task, since it inherits
all the challenges of multi-lingual and curved text detection and inpainting.
To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet
is a conditional adversarial generative network (cGAN) with an auxiliary mask.
The introduced auxiliary mask not only makes the cGAN a generic text eraser,
but also enables stable training and early convergence on a challenging
large-scale synthetic dataset, initially proposed for text detection in real
scenes. What's more, MTRNet achieves state-of-the-art results on several
real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without
being explicitly trained on this data, outperforming previous state-of-the-art
methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc
Trademark image retrieval by local features
The challenge of abstract trademark image retrieval as a test of machine vision algorithms has attracted considerable research interest in the past decade. Current
operational trademark retrieval systems involve manual annotation of the images
(the current âgold standardâ). Accordingly, current systems require a substantial
amount of time and labour to access, and are therefore expensive to operate. This
thesis focuses on the development of algorithms that mimic aspects of human
visual perception in order to retrieve similar abstract trademark images
automatically. A significant category of trademark images are typically highly
stylised, comprising a collection of distinctive graphical elements that often
include geometric shapes. Therefore, in order to compare the similarity of such
images the principal aim of this research has been to develop a method for solving
the partial matching and shape perception problem.
There are few useful techniques for partial shape matching in the context of
trademark retrieval, because those existing techniques tend not to support multicomponent
retrieval. When this work was initiated most trademark image
retrieval systems represented images by means of global features, which are not
suited to solving the partial matching problem. Instead, the author has
investigated the use of local image features as a means to finding similarities
between trademark images that only partially match in terms of their subcomponents.
During the course of this work, it has been established that the
Harris and Chabat detectors could potentially perform sufficiently well to serve as
the basis for local feature extraction in trademark image retrieval. Early findings
in this investigation indicated that the well established SIFT (Scale Invariant
Feature Transform) local features, based on the Harris detector, could potentially
serve as an adequate underlying local representation for matching trademark
images.
There are few researchers who have used mechanisms based on human
perception for trademark image retrieval, implying that the shape representations
utilised in the past to solve this problem do not necessarily reflect the shapes
contained in these image, as characterised by human perception. In response, a
ii
practical approach to trademark image retrieval by perceptual grouping has been
developed based on defining meta-features that are calculated from the spatial
configurations of SIFT local image features. This new technique measures certain
visual properties of the appearance of images containing multiple graphical
elements and supports perceptual grouping by exploiting the non-accidental
properties of their configuration.
Our validation experiments indicated that we were indeed able to capture
and quantify the differences in the global arrangement of sub-components evident
when comparing stylised images in terms of their visual appearance properties.
Such visual appearance properties, measured using 17 of the proposed metafeatures,
include relative sub-component proximity, similarity, rotation and
symmetry. Similar work on meta-features, based on the above Gestalt proximity,
similarity, and simplicity groupings of local features, had not been reported in the
current computer vision literature at the time of undertaking this work.
We decided to adopted relevance feedback to allow the visual appearance
properties of relevant and non-relevant images returned in response to a query to
be determined by example. Since limited training data is available when
constructing a relevance classifier by means of user supplied relevance feedback,
the intrinsically non-parametric machine learning algorithm ID3 (Iterative
Dichotomiser 3) was selected to construct decision trees by means of dynamic
rule induction. We believe that the above approach to capturing high-level visual
concepts, encoded by means of meta-features specified by example through
relevance feedback and decision tree classification, to support flexible trademark
image retrieval and to be wholly novel.
The retrieval performance the above system was compared with two other
state-of-the-art image trademark retrieval systems: Artisan developed by Eakins
(Eakins et al., 1998) and a system developed by Jiang (Jiang et al., 2006). Using
relevance feedback, our system achieves higher average normalised precision
than either of the systems developed by Eakinsâ or Jiang. However, while our
trademark image query and database set is based on an image dataset used by
Eakins, we employed different numbers of images. It was not possible to access to
the same query set and image database used in the evaluation of Jiangâs trademark
iii
image retrieval system evaluation. Despite these differences in evaluation
methodology, our approach would appear to have the potential to improve
retrieval effectiveness
Strengthening China's technological capability
China is increasing its outlay on research and development and seeking to build an innovation system that will deliver quick results not just in absorbing technology but also in pushing the technological envelope. China's spending on R&D rose from 1.1 percent of GDP in 2000 to 1.3 percent of GDP in 2005. On a purchasing power parity basis, China's research outlay was among the world's highest, far greater than that of Brazil, India, or Mexico. Chinese firms are active in the fields of biotechnology, pharmaceuticals, alternative energy sources, and nanotechnology. This surge in spending has been parallel by a sharp increase in patent applications in China, with the bulk of the patents registered in the areas of electronics, information technology, and telecoms. However, of the almost 50,000 patents granted in China, nearly two-thirds were to nonresidents. This paper considers two questions that are especially important for China. First, how might China go about accelerating technology development? Second, what measures could most cost-effectively deliver the desired outcomes? It concludes that although the level of financing for R&D is certainly important, technological advance is closely keyed to absorptive capacity which is a function of the volume and quality of talent and the depth as well as the heterogeneity of research experience. It is also a function of how companies maximize the commercial benefits of research and development, and the coordination of research with production and marketing.Technology Industry,Tertiary Education,E-Business,ICT Policy and Strategies,Agricultural Knowledge&Information Systems
- âŠ