2,684 research outputs found
Salient Object Detection: A Discriminative Regional Feature Integration Approach
Salient object detection has been attracting a lot of interest, and recently
various heuristic computational models have been designed. In this paper, we
formulate saliency map computation as a regression problem. Our method, which
is based on multi-level image segmentation, utilizes the supervised learning
approach to map the regional feature vector to a saliency score. Saliency
scores across multiple levels are finally fused to produce the saliency map.
The contributions lie in two-fold. One is that we propose a discriminate
regional feature integration approach for salient object detection. Compared
with existing heuristic models, our proposed method is able to automatically
integrate high-dimensional regional saliency features and choose discriminative
ones. The other is that by investigating standard generic region properties as
well as two widely studied concepts for salient object detection, i.e.,
regional contrast and backgroundness, our approach significantly outperforms
state-of-the-art methods on six benchmark datasets. Meanwhile, we demonstrate
that our method runs as fast as most existing algorithms
Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency
The performance of video saliency estimation techniques has achieved
significant advances along with the rapid development of Convolutional Neural
Networks (CNNs). However, devices like cameras and drones may have limited
computational capability and storage space so that the direct deployment of
complex deep saliency models becomes infeasible. To address this problem, this
paper proposes a dynamic saliency estimation approach for aerial videos via
spatiotemporal knowledge distillation. In this approach, five components are
involved, including two teachers, two students and the desired spatiotemporal
model. The knowledge of spatial and temporal saliency is first separately
transferred from the two complex and redundant teachers to their simple and
compact students, and the input scenes are also degraded from high-resolution
to low-resolution to remove the probable data redundancy so as to greatly speed
up the feature extraction process. After that, the desired spatiotemporal model
is further trained by distilling and encoding the spatial and temporal saliency
knowledge of two students into a unified network. In this manner, the
inter-model redundancy can be further removed for the effective estimation of
dynamic saliency on aerial videos. Experimental results show that the proposed
approach outperforms ten state-of-the-art models in estimating visual saliency
on aerial videos, while its speed reaches up to 28,738 FPS on the GPU platform
Salient Object Detection in the Deep Learning Era: An In-Depth Survey
As an essential problem in computer vision, salient object detection (SOD)
has attracted an increasing amount of research attention over the years. Recent
advances in SOD are predominantly led by deep learning-based solutions (named
deep SOD). To enable in-depth understanding of deep SOD, in this paper, we
provide a comprehensive survey covering various aspects, ranging from algorithm
taxonomy to unsolved issues. In particular, we first review deep SOD algorithms
from different perspectives, including network architecture, level of
supervision, learning paradigm, and object-/instance-level detection. Following
that, we summarize and analyze existing SOD datasets and evaluation metrics.
Then, we benchmark a large group of representative SOD models, and provide
detailed analyses of the comparison results. Moreover, we study the performance
of SOD algorithms under different attribute settings, which has not been
thoroughly explored previously, by constructing a novel SOD dataset with rich
attribute annotations covering various salient object types, challenging
factors, and scene categories. We further analyze, for the first time in the
field, the robustness of SOD models to random input perturbations and
adversarial attacks. We also look into the generalization and difficulty of
existing SOD datasets. Finally, we discuss several open issues of SOD and
outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our
constructed dataset with annotations, and codes for evaluation are publicly
available at \url{https://github.com/wenguanwang/SODsurvey
Hierarchical Deep Co-segmentation of Primary Objects in Aerial Videos
Primary object segmentation plays an important role in understanding videos
generated by unmanned aerial vehicles. In this paper, we propose a large-scale
dataset with 500 aerial videos and manually annotated primary objects. To the
best of our knowledge, it is the largest dataset to date for primary object
segmentation in aerial videos. From this dataset, we find most aerial videos
contain large-scale scenes, small primary objects as well as consistently
varying scales and viewpoints. Inspired by that, we propose a hierarchical deep
co-segmentation approach that repeatedly divides a video into two sub-videos
formed by the odd and even frames, respectively. In this manner, the primary
objects shared by sub-videos can be co-segmented by training two-stream CNNs
and finally refined within the neighborhood reversible flows. Experimental
results show that our approach remarkably outperforms 17 state-of-the-art
methods in segmenting primary objects in various types of aerial videos
Adaptive Blind Image Watermarking Using Fuzzy Inference System Based on Human Visual Perception
Development of digital content has increased the necessity of copyright
protection by means of watermarking. Imperceptibility and robustness are two
important features of watermarking algorithms. The goal of watermarking methods
is to satisfy the tradeoff between these two contradicting characteristics.
Recently watermarking methods in transform domains have displayed favorable
results. In this paper, we present an adaptive blind watermarking method which
has high transparency in areas that are important to human visual system. We
propose a fuzzy system for adaptive control of the embedding strength factor.
Features such as saliency, intensity, and edge-concentration, are used as fuzzy
attributes. Redundant embedding in discrete cosine transform (DCT) of wavelet
domain has increased the robustness of our method. Experimental results show
the efficiency of the proposed method and better results are obtained as
compared to comparable methods with same size of watermark logo.Comment: 11 pages, 11 figure
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Text in curve orientation, despite being one of the common text orientations
in real world environment, has close to zero existence in well received scene
text datasets such as ICDAR2013 and MSRA-TD500. The main motivation of
Total-Text is to fill this gap and facilitate a new research direction for the
scene text community. On top of the conventional horizontal and multi-oriented
texts, it features curved-oriented text. Total-Text is highly diversified in
orientations, more than half of its images have a combination of more than two
orientations. Recently, a new breed of solutions that casted text detection as
a segmentation problem has demonstrated their effectiveness against
multi-oriented text. In order to evaluate its robustness against curved text,
we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its
annotation is available at https://github.com/cs-chan/Total-Text-DatasetComment: Accepted as Oral presentation in ICDAR2017 (Extended version, 13
pages 17 figures). We introduce a new scene text dataset namely as
Total-Text, which is more comprehensive than the existing scene text datasets
as it consists of 1555 natural images with more than 3 different text
orientations, one of a kin
ART-UP: A Novel Method for Generating Scanning-robust Aesthetic QR codes
QR codes are usually scanned in different environments, so they must be
robust to variations in illumination, scale, coverage, and camera angles.
Aesthetic QR codes improve the visual quality, but subtle changes in their
appearance may cause scanning failure. In this paper, a new method to generate
scanning-robust aesthetic QR codes is proposed, which is based on a
module-based scanning probability estimation model that can effectively balance
the tradeoff between visual quality and scanning robustness. Our method locally
adjusts the luminance of each module by estimating the probability of
successful sampling. The approach adopts the hierarchical, coarse-to-fine
strategy to enhance the visual quality of aesthetic QR codes, which
sequentially generate the following three codes: a binary aesthetic QR code, a
grayscale aesthetic QR code, and the final color aesthetic QR code. Our
approach also can be used to create QR codes with different visual styles by
adjusting some initialization parameters. User surveys and decoding experiments
were adopted for evaluating our method compared with state-of-the-art
algorithms, which indicates that the proposed approach has excellent
performance in terms of both visual quality and scanning robustness.Comment: 15page
An Empirical Study towards Understanding How Deep Convolutional Nets Recognize Falls
Detecting unintended falls is essential for ambient intelligence and
healthcare of elderly people living alone. In recent years, deep convolutional
nets are widely used in human action analysis, based on which a number of fall
detection methods have been proposed. Despite their highly effective
performances, the behaviors of how the convolutional nets recognize falls are
still not clear. In this paper, instead of proposing a novel approach, we
perform a systematical empirical study, attempting to investigate the
underlying fall recognition process. We propose four tasks to investigate,
which involve five types of input modalities, seven net instances and different
training samples. The obtained quantitative and qualitative results reveal the
patterns that the nets tend to learn, and several factors that can heavily
influence the performances on fall recognition. We expect that our conclusions
are favorable to proposing better deep learning solutions to fall detection
systems.Comment: published at the sixth International Workshop on Assistive Computer
Vision and Robotics (ACVR), in conjunction with European Conference on
Computer Vision (ECCV), Munich, 201
STAR-RT: Visual attention for real-time video game playing
In this paper we present STAR-RT - the first working prototype of Selective
Tuning Attention Reference (STAR) model and Cognitive Programs (CPs). The
Selective Tuning (ST) model received substantial support through psychological
and neurophysiological experiments. The STAR framework expands ST and applies
it to practical visual tasks. In order to do so, similarly to many cognitive
architectures, STAR combines the visual hierarchy (based on ST) with the
executive controller, working and short-term memory components and fixation
controller. CPs in turn enable the communication among all these elements for
visual task execution. To test the relevance of the system in a realistic
context, we implemented the necessary components of STAR and designed CPs for
playing two closed-source video games - Canabaltand Robot Unicorn Attack. Since
both games run in a browser window, our algorithm has the same amount of
information and the same amount of time to react to the events on the screen as
a human player would. STAR-RT plays both games in real time using only visual
input and achieves scores comparable to human expert players. It thus provides
an existence proof for the utility of the particular CP structure and
primitives used and the potential for continued experimentation and
verification of their utility in broader scenarios.Comment: 21 page, 13 figure
CAPTAIN: Comprehensive Composition Assistance for Photo Taking
Many people are interested in taking astonishing photos and sharing with
others. Emerging hightech hardware and software facilitate ubiquitousness and
functionality of digital photography. Because composition matters in
photography, researchers have leveraged some common composition techniques to
assess the aesthetic quality of photos computationally. However, composition
techniques developed by professionals are far more diverse than well-documented
techniques can cover. We leverage the vast underexplored innovations in
photography for computational composition assistance. We propose a
comprehensive framework, named CAPTAIN (Composition Assistance for Photo
Taking), containing integrated deep-learned semantic detectors, sub-genre
categorization, artistic pose clustering, personalized aesthetics-based image
retrieval, and style set matching. The framework is backed by a large dataset
crawled from a photo-sharing Website with mostly photography enthusiasts and
professionals. The work proposes a sequence of steps that have not been
explored in the past by researchers. The work addresses personal preferences
for composition through presenting a ranked-list of photographs to the user
based on user-specified weights in the similarity measure. The matching
algorithm recognizes the best shot among a sequence of shots with respect to
the user's preferred style set. We have conducted a number of experiments on
the newly proposed components and reported findings. A user study demonstrates
that the work is useful to those taking photos.Comment: 30 pages, 21 figures, 4 tables, submitted to IJCV (International
Journal of Computer Vision
- …