7,097 research outputs found
Salient Object Detection with Semantic Priors
Salient object detection has increasingly become a popular topic in cognitive
and computational sciences, including computer vision and artificial
intelligence research. In this paper, we propose integrating \textit{semantic
priors} into the salient object detection process. Our algorithm consists of
three basic steps. Firstly, the explicit saliency map is obtained based on the
semantic segmentation refined by the explicit saliency priors learned from the
data. Next, the implicit saliency map is computed based on a trained model
which maps the implicit saliency priors embedded into regional features with
the saliency values. Finally, the explicit semantic map and the implicit map
are adaptively fused to form a pixel-accurate saliency map which uniformly
covers the objects of interest. We further evaluate the proposed framework on
two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental
results demonstrate that our method outperforms other state-of-the-art methods.Comment: accepted to IJCAI 201
Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition
This manuscript introduces the problem of prominent object detection and
recognition inspired by the fact that human seems to priorities perception of
scene elements. The problem deals with finding the most important region of
interest, segmenting the relevant item/object in that area, and assigning it an
object class label. In other words, we are solving the three problems of
saliency modeling, saliency detection, and object recognition under one
umbrella. The motivation behind such a problem formulation is (1) the benefits
to the knowledge representation-based vision pipelines, and (2) the potential
improvements in emulating bio-inspired vision systems by solving these three
problems together. We are foreseeing extending this problem formulation to
fully semantically segmented scenes with instance object priority for
high-level inferences in various applications including assistive vision. Along
with a new problem definition, we also propose a method to achieve such a task.
The proposed model predicts the most important area in the image, segments the
associated objects, and labels them. The proposed problem and method are
evaluated against human fixations, annotated segmentation masks, and object
class categories. We define a chance level for each of the evaluation criterion
to compare the proposed algorithm with. Despite the good performance of the
proposed baseline, the overall evaluations indicate that the problem of
prominent object detection and recognition is a challenging task that is still
worth investigating further
A Novel Semantics and Feature Preserving Perspective for Content Aware Image Retargeting
There is an increasing requirement for efficient image retargeting techniques
to adapt the content to various forms of digital media. With rapid growth of
mobile communications and dynamic web page layouts, one often needs to resize
the media content to adapt to the desired display sizes. For various layouts of
web pages and typically small sizes of handheld portable devices, the
importance in the original image content gets obfuscated after resizing it with
the approach of uniform scaling. Thus, there occurs a need for resizing the
images in a content aware manner which can automatically discard irrelevant
information from the image and present the salient features with more
magnitude. There have been proposed some image retargeting techniques keeping
in mind the content awareness of the input image. However, these techniques
fail to prove globally effective for various kinds of images and desired sizes.
The major problem is the inefficiency of these algorithms to process these
images with minimal visual distortion while also retaining the meaning conveyed
from the image. In this dissertation, we present a novel perspective for
content aware image retargeting, which is well implementable in real time. We
introduce a novel method of analysing semantic information within the input
image while also maintaining the important and visually significant features.
We present the various nuances of our algorithm mathematically and logically,
and show that the results prove better than the state-of-the-art techniques.Comment: 74 Pages, 46 Figures, Masters Thesi
Dynamically Visual Disambiguation of Keyword-based Image Search
Due to the high cost of manual annotation, learning directly from the web has
attracted broad attention. One issue that limits their performance is the
problem of visual polysemy. To address this issue, we present an adaptive
multi-model framework that resolves polysemy by visual disambiguation. Compared
to existing methods, the primary advantage of our approach lies in that our
approach can adapt to the dynamic changes in the search results. Our proposed
framework consists of two major steps: we first discover and dynamically select
the text queries according to the image search results, then we employ the
proposed saliency-guided deep multi-instance learning network to remove
outliers and learn classification models for visual disambiguation. Extensive
experiments demonstrate the superiority of our proposed approach.Comment: Accepted by International Joint Conference on Artificial Intelligence
(IJCAI), 201
Saliency-Guided Attention Network for Image-Sentence Matching
This paper studies the task of matching image and sentence, where learning
appropriate representations across the multi-modal data appears to be the main
challenge. Unlike previous approaches that predominantly deploy symmetrical
architecture to represent both modalities, we propose Saliency-guided Attention
Network (SAN) that asymmetrically employs visual and textual attention modules
to learn the fine-grained correlation intertwined between vision and language.
The proposed SAN mainly includes three components: saliency detector,
Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual
Attention (STA) module. Concretely, the saliency detector provides the visual
saliency information as the guidance for the two attention modules. SVA is
designed to leverage the advantage of the saliency information to improve
discrimination of visual representations. By fusing the visual information from
SVA and textual information as a multi-modal guidance, STA learns
discriminative textual representations that are highly sensitive to visual
clues. Extensive experiments demonstrate SAN can substantially improve the
state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a
large margin.Comment: 10 pages, 5 figure
Visual saliency estimation by integrating features using multiple kernel learning
In the last few decades, significant achievements have been attained in
predicting where humans look at images through different computational models.
However, how to determine contributions of different visual features to overall
saliency still remains an open problem. To overcome this issue, a recent class
of models formulates saliency estimation as a supervised learning problem and
accordingly apply machine learning techniques. In this paper, we also address
this challenging problem and propose to use multiple kernel learning (MKL) to
combine information coming from different feature dimensions and to perform
integration at an intermediate level. Besides, we suggest to use responses of a
recently proposed filterbank of object detectors, known as Object-Bank, as
additional semantic high-level features. Here we show that our MKL-based
framework together with the proposed object-specific features provide
state-of-the-art performance as compared to SVM or AdaBoost-based saliency
models
Modeling Bottom-Up and Top-Down Attention with a Neurodynamic Model of V1
Previous studies suggested that lateral interactions of V1 cells are
responsible, among other visual effects, of bottom-up visual attention
(alternatively named visual salience or saliency). Our objective is to mimic
these connections with a neurodynamic network of firing-rate neurons in order
to predict visual attention. Early visual subcortical processes (i.e. retinal
and thalamic) are functionally simulated. An implementation of the cortical
magnification function is included to define the retinotopical projections
towards V1, processing neuronal activity for each distinct view during scene
observation. Novel computational definitions of top-down inhibition (in terms
of inhibition of return and selection mechanisms), are also proposed to predict
attention in Free-Viewing and Visual Search tasks. Results show that our model
outpeforms other biologically-inpired models of saliency prediction while
predicting visual saccade sequences with the same model. We also show how
temporal and spatial characteristics of inhibition of return can improve
prediction of saccades, as well as how distinct search strategies (in terms of
feature-selective or category-specific inhibition) can predict attention at
distinct image contexts.Comment: 27 pages, 19 figure
Interpreting Adversarial Examples with Attributes
Deep computer vision systems being vulnerable to imperceptible and carefully
crafted noise have raised questions regarding the robustness of their
decisions. We take a step back and approach this problem from an orthogonal
direction. We propose to enable black-box neural networks to justify their
reasoning both for clean and for adversarial examples by leveraging attributes,
i.e. visually discriminative properties of objects. We rank attributes based on
their class relevance, i.e. how the classification decision changes when the
input is visually slightly perturbed, as well as image relevance, i.e. how well
the attributes can be localized on both clean and perturbed images. We present
comprehensive experiments for attribute prediction, adversarial example
generation, adversarially robust learning, and their qualitative and
quantitative analysis using predicted attributes on three benchmark datasets
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Object-Part Attention Model for Fine-grained Image Classification
Fine-grained image classification is to recognize hundreds of subcategories
belonging to the same basic-level category, such as 200 subcategories belonging
to the bird, which is highly challenging due to large variance in the same
subcategory and small variance among different subcategories. Existing methods
generally first locate the objects or parts and then discriminate which
subcategory the image belongs to. However, they mainly have two limitations:
(1) Relying on object or part annotations which are heavily labor consuming.
(2) Ignoring the spatial relationships between the object and its parts as well
as among these parts, both of which are significantly helpful for finding
discriminative parts. Therefore, this paper proposes the object-part attention
model (OPAM) for weakly supervised fine-grained image classification, and the
main novelties are: (1) Object-part attention model integrates two level
attentions: object-level attention localizes objects of images, and part-level
attention selects discriminative parts of object. Both are jointly employed to
learn multi-view and multi-scale features to enhance their mutual promotions.
(2) Object-part spatial constraint model combines two spatial constraints:
object spatial constraint ensures selected parts highly representative, and
part spatial constraint eliminates redundancy and enhances discrimination of
selected parts. Both are jointly employed to exploit the subtle and local
differences for distinguishing the subcategories. Importantly, neither object
nor part annotations are used in our proposed approach, which avoids the heavy
labor consumption of labeling. Comparing with more than 10 state-of-the-art
methods on 4 widely-used datasets, our OPAM approach achieves the best
performance.Comment: 14 pages, submitted to IEEE Transactions on Image Processin
- …