704 research outputs found
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Visual Understanding via Multi-Feature Shared Learning with Global Consistency
Image/video data is usually represented with multiple visual features. Fusion
of multi-source information for establishing the attributes has been widely
recognized. Multi-feature visual recognition has recently received much
attention in multimedia applications. This paper studies visual understanding
via a newly proposed l_2-norm based multi-feature shared learning framework,
which can simultaneously learn a global label matrix and multiple
sub-classifiers with the labeled multi-feature data. Additionally, a group
graph manifold regularizer composed of the Laplacian and Hessian graph is
proposed for better preserving the manifold structure of each feature, such
that the label prediction power is much improved through the semi-supervised
learning with global label consistency. For convenience, we call the proposed
approach Global-Label-Consistent Classifier (GLCC). The merits of the proposed
method include: 1) the manifold structure information of each feature is
exploited in learning, resulting in a more faithful classification owing to the
global label consistency; 2) a group graph manifold regularizer based on the
Laplacian and Hessian regularization is constructed; 3) an efficient
alternative optimization method is introduced as a fast solver owing to the
convex sub-problems. Experiments on several benchmark visual datasets for
multimedia understanding, such as the 17-category Oxford Flower dataset, the
challenging 101-category Caltech dataset, the YouTube & Consumer Videos dataset
and the large-scale NUS-WIDE dataset, demonstrate that the proposed approach
compares favorably with the state-of-the-art algorithms. An extensive
experiment on the deep convolutional activation features also show the
effectiveness of the proposed approach. The code is available on
http://www.escience.cn/people/lei/index.htmlComment: 13 pages,6 figures, this paper is accepted for publication in IEEE
Transactions on Multimedi
Recommended from our members
DATA-DRIVEN APPROACH TO IMAGE CLASSIFICATION
Image classification has been a core topic in the computer vision community. Its recent success with convolutional neural network (CNN) algorithm has led to various real world applications such as large scale management of photos/videos on cloud/social-media, image based search for online retailers, self-driving cars, building robots and healthcare. Image classification can be broadly categorized into binary, multi-class and multi-label classification problems. Binary classification involves assigning one of the two class labels to an instance. In multi-class classification problem, an instance should be categorized into one of more than two classes. Multi-label classification is a generalized version of the multi-class classification problem where each image is assigned multiple labels as opposed to a single label.
In this work, we first present various methods that take advantage of deep representations (fully connected layer of pre-trained CNN on the ImageNet dataset) and yield better performance on multi-label classification when compared to methods that use over a dozen conventional visual features. Following the success of deep representations, we intend to build a generic end-to-end deep learning framework to address all three problem categories of image classification. However, there are still no well established guidelines (in terms of choosing the number of layers to go deeper, the number of kernels and the size, the type of regularizer, the choice of non-linear function, etc.) to build an efficient deep neural network and often network architecture design is specific to a problem/dataset. Hence, we present some initial efforts in building a computational framework called Deep Decision Network (DDN) which is completely data-driven. DDN is a tree-like structured built stage-wise. During the learning phase, starting from the root network node, DDN automatically builds a network that splits the data into disjoint clusters of classes which would be handled by the subsequent expert networks. This results in a tree-like structured network driven by the data. The proposed approach provides an insight into the data by identifying the group of classes that are hard to classify and require more attention when compared to others. This feature is crucial for people trying to solve the problem with little or no domain knowledge, especially for applications in medical domain. Initially, we evaluate DDN on a binary classification problem and later extend it to more challenging multi-class and multi-label classification problems. The extension of DDN to multi-class and multi-label involves some changes but they still operate under the same underlying principle. In all the three cases, the proposed approach is tested for its recognition performance and scalability on publicly available datasets providing comparison to other methods
- …