2,304 research outputs found
Describing Images by Semantic Modeling using Attributes and Tags
This dissertation addresses the problem of describing images using visual attributes and textual tags, a fundamental task that narrows down the semantic gap between the visual reasoning of humans and machines. Automatic image annotation assigns relevant textual tags to the images. In this dissertation, we propose a query-specific formulation based on Weighted Multi-view Non-negative Matrix Factorization to perform automatic image annotation. Our proposed technique seamlessly adapt to the changes in training data, naturally solves the problem of feature fusion and handles the challenge of the rare tags. Unlike tags, attributes are category-agnostic, hence their combination models an exponential number of semantic labels. Motivated by the fact that most attributes describe local properties, we propose exploiting localization cues, through semantic parsing of human face and body to improve person-related attribute prediction. We also demonstrate that image-level attribute labels can be effectively used as weak supervision for the task of semantic segmentation. Next, we analyze the Selfie images by utilizing tags and attributes. We collect the first large-scale Selfie dataset and annotate it with different attributes covering characteristics such as gender, age, race, facial gestures, and hairstyle. We then study the popularity and sentiments of the selfies given an estimated appearance of various semantic concepts. In brief, we automatically infer what makes a good selfie. Despite its extensive usage, the deep learning literature falls short in understanding the characteristics and behavior of the Batch Normalization. We conclude this dissertation by providing a fresh view, in light of information geometry and Fisher kernels to why the batch normalization works. We propose Mixture Normalization that disentangles modes of variation in the underlying distribution of the layer outputs and confirm that it effectively accelerates training of different batch-normalized architectures including Inception-V3, Densely Connected Networks, and Deep Convolutional Generative Adversarial Networks while achieving better generalization error
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Love Thy Neighbors: Image Annotation by Exploiting Image Metadata
Some images that are difficult to recognize on their own may become more
clear in the context of a neighborhood of related images with similar
social-network metadata. We build on this intuition to improve multilabel image
annotation. Our model uses image metadata nonparametrically to generate
neighborhoods of related images using Jaccard similarities, then uses a deep
neural network to blend visual information from the image and its neighbors.
Prior work typically models image metadata parametrically, in contrast, our
nonparametric treatment allows our model to perform well even when the
vocabulary of metadata changes between training and testing. We perform
comprehensive experiments on the NUS-WIDE dataset, where we show that our model
outperforms state-of-the-art methods for multilabel image annotation even when
our model is forced to generalize to new types of metadata.Comment: Accepted to ICCV 201
Toward Real-Time Image Annotation Using Marginalized Coupled Dictionary Learning
In most image retrieval systems, images include various high-level semantics,
called tags or annotations. Virtually all the state-of-the-art image annotation
methods that handle imbalanced labeling are search-based techniques which are
time-consuming. In this paper, a novel coupled dictionary learning approach is
proposed to learn a limited number of visual prototypes and their corresponding
semantics simultaneously. This approach leads to a real-time image annotation
procedure. Another contribution of this paper is that utilizes a marginalized
loss function instead of the squared loss function that is inappropriate for
image annotation with imbalanced labels. We have employed a marginalized loss
function in our method to leverage a simple and effective method of prototype
updating. Meanwhile, we have introduced regularization on semantic
prototypes to preserve the sparse and imbalanced nature of labels in learned
semantic prototypes. Finally, comprehensive experimental results on various
datasets demonstrate the efficiency of the proposed method for image annotation
tasks in terms of accuracy and time. The reference implementation is publicly
available on https://github.com/hamid-amiri/MCDL-Image-Annotation.Comment: @article{roostaiyan2022toward, title={Toward real-time image
annotation using marginalized coupled dictionary learning},
author={Roostaiyan, Seyed Mahdi and Hosseini, Mohammad Mehdi and Kashani,
Mahya Mohammadi and Amiri, S Hamid}, journal={Journal of Real-Time Image
Processing}, volume={19}, number={3}, pages={623--638}, year={2022},
publisher={Springer}
- …