47,606 research outputs found
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
Scene Graph Generation with External Knowledge and Image Reconstruction
Scene graph generation has received growing attention with the advancements
in image understanding tasks such as object detection, attributes and
relationship prediction,~\etc. However, existing datasets are biased in terms
of object and relationship labels, or often come with noisy and missing
annotations, which makes the development of a reliable scene graph prediction
model very challenging. In this paper, we propose a novel scene graph
generation algorithm with external knowledge and image reconstruction loss to
overcome these dataset issues. In particular, we extract commonsense knowledge
from the external knowledge base to refine object and phrase features for
improving generalizability in scene graph generation. To address the bias of
noisy object annotations, we introduce an auxiliary image reconstruction path
to regularize the scene graph generation network. Extensive experiments show
that our framework can generate better scene graphs, achieving the
state-of-the-art performance on two benchmark datasets: Visual Relationship
Detection and Visual Genome datasets.Comment: 10 pages, 5 figures, Accepted in CVPR 201
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
Deep Lesion Graphs in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database
Radiologists in their daily work routinely find and annotate significant
abnormalities on a large number of radiology images. Such abnormalities, or
lesions, have collected over years and stored in hospitals' picture archiving
and communication systems. However, they are basically unsorted and lack
semantic annotations like type and location. In this paper, we aim to organize
and explore them by learning a deep feature representation for each lesion. A
large-scale and comprehensive dataset, DeepLesion, is introduced for this task.
DeepLesion contains bounding boxes and size measurements of over 32K lesions.
To model their similarity relationship, we leverage multiple supervision
information including types, self-supervised location coordinates and sizes.
They require little manual annotation effort but describe useful attributes of
the lesions. Then, a triplet network is utilized to learn lesion embeddings
with a sequential sampling strategy to depict their hierarchical similarity
structure. Experiments show promising qualitative and quantitative results on
lesion retrieval, clustering, and classification. The learned embeddings can be
further employed to build a lesion graph for various clinically useful
applications. We propose algorithms for intra-patient lesion matching and
missing annotation mining. Experimental results validate their effectiveness.Comment: Accepted by CVPR2018. DeepLesion url adde
Context-Aware Zero-Shot Recognition
We present a novel problem setting in zero-shot learning, zero-shot object
recognition and detection in the context. Contrary to the traditional zero-shot
learning methods, which simply infers unseen categories by transferring
knowledge from the objects belonging to semantically similar seen categories,
we aim to understand the identity of the novel objects in an image surrounded
by the known objects using the inter-object relation prior. Specifically, we
leverage the visual context and the geometric relationships between all pairs
of objects in a single image, and capture the information useful to infer
unseen categories. We integrate our context-aware zero-shot learning framework
into the traditional zero-shot learning techniques seamlessly using a
Conditional Random Field (CRF). The proposed algorithm is evaluated on both
zero-shot region classification and zero-shot detection tasks. The results on
Visual Genome (VG) dataset show that our model significantly boosts performance
with the additional visual context compared to traditional methods
Automatic Leaf Extraction from Outdoor Images
Automatic plant recognition and disease analysis may be streamlined by an
image of a complete, isolated leaf as an initial input. Segmenting leaves from
natural images is a hard problem. Cluttered and complex backgrounds: often
composed of other leaves are commonplace. Furthermore, their appearance is
highly dependent upon illumination and viewing perspective. In order to address
these issues we propose a methodology which exploits the leaves venous systems
in tandem with other low level features. Background and leaf markers are
created using colour, intensity and texture. Two approaches are investigated:
watershed and graph-cut and results compared. Primary-secondary vein detection
and a protrusion-notch removal are applied to refine the extracted leaf. The
efficacy of our approach is demonstrated against existing work.Comment: 13 pages, India-UK Advanced Technology Centre of Excellence in Next
Generation Networks, Systems and Services (IU-ATC), 201
- …