12 research outputs found
Love Thy Neighbors: Image Annotation by Exploiting Image Metadata
Some images that are difficult to recognize on their own may become more
clear in the context of a neighborhood of related images with similar
social-network metadata. We build on this intuition to improve multilabel image
annotation. Our model uses image metadata nonparametrically to generate
neighborhoods of related images using Jaccard similarities, then uses a deep
neural network to blend visual information from the image and its neighbors.
Prior work typically models image metadata parametrically, in contrast, our
nonparametric treatment allows our model to perform well even when the
vocabulary of metadata changes between training and testing. We perform
comprehensive experiments on the NUS-WIDE dataset, where we show that our model
outperforms state-of-the-art methods for multilabel image annotation even when
our model is forced to generalize to new types of metadata.Comment: Accepted to ICCV 201
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Acceleration of ListNet for ranking using reconfigurable architecture
Document ranking is used to order query results by relevance with ranking models. ListNet is a
well-known ranking approach for constructing and training learning-to-rank models. Compared with traditional learning approaches, ListNet delivers better accuracy, but is computationally too expensive to learn models with large data sets due to the large number of permutations and documents involved in computing the gradients. Currently, the long training time limits the practicality of ListNet in ranking applications such as breaking news search and stock prediction, and this situation is getting worse with the increase in data-set size. In order to tackle the challenge of long training time, this thesis optimises the ListNet algorithm, and designs hardware accelerators for learning the ListNet algorithm using Field Programmable Gate Arrays (FPGAs), making the algorithm more practical for real-world application.
The contributions of this thesis include: 1) A novel computation method of the ListNet algorithm for ranking. The proposed computation method exposes more fine-grained parallelism for FPGA implementation. 2) A weighted sampling method that takes into account the ranking positions, along with an effective quantisation method based on FPGA devices. The proposed design achieves a 4.42x improvement over GPU implementation speed, while still guaranteeing the accuracy. 3) A full reconfigurable architecture for the ListNet training using multiple bitstream kernels. The proposed method achieves a higher model accuracy than pure fixed point training, and a better throughput than pure floating point training. This thesis has resulted in the acceleration of the ListNet algorithm for ranking using FPGAs by applying the above techniques. Significant improvements in speed have been achieved in this work against CPU and GPU implementations.Open Acces
Image Understanding by Socializing the Semantic Gap
Several technological developments like the Internet, mobile devices and Social Networks have spurred the sharing of images in unprecedented volumes, making tagging and commenting a common habit. Despite the recent progress in image analysis, the problem of Semantic Gap still hinders machines in fully understand the rich semantic of a shared photo. In this book, we tackle this problem by exploiting social network contributions. A comprehensive treatise of three linked problems on image annotation is presented, with a novel experimental protocol used to test eleven state-of-the-art methods. Three novel approaches to annotate, under stand the sentiment and predict the popularity of an image are presented. We conclude with the many challenges and opportunities ahead for the multimedia community
A Max-Margin Riffled Independence Model for Image Tag Ranking
We propose Max-Margin Riffled Independence Model (MMRIM), a new method for image tag ranking modeling the structured preferences among tags. The goal is to predict a ranked tag list for a given image, where tags are ordered by their importance or relevance to the image content. Our model integrates the max-margin formalism with riffled independence factorizations proposed in [10], which naturally allows for structured learning and efficient ranking. Experimental results on the SUN Attribute and LabelMe datasets demonstrate the superior performance of the proposed model compared with baseline tag ranking methods. We also apply the predicted rank list of tags to several higher-level computer vision applications in image understanding and retrieval, and demonstrate that MMRIM significantly improves the accuracy of these applications. 1
Bowdoin Orient v.127, no.1-11 (1996-1996)
https://digitalcommons.bowdoin.edu/bowdoinorient-1990s/1008/thumbnail.jp
Bowdoin Orient v.115, no.1-27 (1985-1986)
https://digitalcommons.bowdoin.edu/bowdoinorient-1980s/1006/thumbnail.jp
Bowdoin Orient v.125, no.1-25 (1994-1995)
https://digitalcommons.bowdoin.edu/bowdoinorient-1990s/1006/thumbnail.jp