31,318 research outputs found
Sharp Attention Network via Adaptive Sampling for Person Re-identification
In this paper, we present novel sharp attention networks by adaptively
sampling feature maps from convolutional neural networks (CNNs) for person
re-identification (re-ID) problem. Due to the introduction of sampling-based
attention models, the proposed approach can adaptively generate sharper
attention-aware feature masks. This greatly differs from the gating-based
attention mechanism that relies soft gating functions to select the relevant
features for person re-ID. In contrast, the proposed sampling-based attention
mechanism allows us to effectively trim irrelevant features by enforcing the
resultant feature masks to focus on the most discriminative features. It can
produce sharper attentions that are more assertive in localizing subtle
features relevant to re-identifying people across cameras. For this purpose, a
differentiable Gumbel-Softmax sampler is employed to approximate the Bernoulli
sampling to train the sharp attention networks. Extensive experimental
evaluations demonstrate the superiority of this new sharp attention model for
person re-ID over the other state-of-the-art methods on three challenging
benchmarks including CUHK03, Market-1501, and DukeMTMC-reID.Comment: accepted by IEEE Transactions on Circuits and Systems for Video
Technology(T-CSVT
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Metric Attack and Defense for Person Re-identification
Person re-identification (re-ID) has attracted much attention recently due to
its great importance in video surveillance. In general, distance metrics used
to identify two person images are expected to be robust under various
appearance changes. However, our work observes the extreme vulnerability of
existing distance metrics to adversarial examples, generated by simply adding
human-imperceptible perturbations to person images. Hence, the security danger
is dramatically increased when deploying commercial re-ID systems in video
surveillance.
Although adversarial examples have been extensively applied for
classification analysis, it is rarely studied in metric analysis like person
re-identification. The most likely reason is the natural gap between the
training and testing of re-ID networks, that is, the predictions of a re-ID
network cannot be directly used during testing without an effective metric. In
this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel
methodology to adversarial classification attacks. Comprehensive experiments
clearly reveal the adversarial effects in re-ID systems. Meanwhile, we also
present an early attempt of training a metric-preserving network, thereby
defending the metric against adversarial attacks. At last, by benchmarking
various adversarial settings, we expect that our work can facilitate the
development of adversarial attack and defense in metric-based applications
Identification of homophily and preferential recruitment in respondent-driven sampling
Respondent-driven sampling (RDS) is a link-tracing procedure for surveying
hidden or hard-to-reach populations in which subjects recruit other subjects
via their social network. There is significant research interest in detecting
clustering or dependence of epidemiological traits in networks, but researchers
disagree about whether data from RDS studies can reveal it. Two distinct
mechanisms account for dependence in traits of recruiters and recruitees in an
RDS study: homophily, the tendency for individuals to share social ties with
others exhibiting similar characteristics, and preferential recruitment, in
which recruiters do not recruit uniformly at random from their available
alters. The different effects of network homophily and preferential recruitment
in RDS studies have been a source of confusion in methodological research on
RDS, and in empirical studies of the social context of health risk in hidden
populations. In this paper, we give rigorous definitions of homophily and
preferential recruitment and show that neither can be measured precisely in
general RDS studies. We derive nonparametric identification regions for
homophily and preferential recruitment and show that these parameters are not
point identified unless the network takes a degenerate form. The results
indicate that claims of homophily or recruitment bias measured from empirical
RDS studies may not be credible. We apply our identification results to a study
involving both a network census and RDS on a population of injection drug users
in Hartford, CT
Privacy-Protective-GAN for Face De-identification
Face de-identification has become increasingly important as the image sources
are explosively growing and easily accessible. The advance of new face
recognition techniques also arises people's concern regarding the privacy
leakage. The mainstream pipelines of face de-identification are mostly based on
the k-same framework, which bears critiques of low effectiveness and poor
visual quality. In this paper, we propose a new framework called
Privacy-Protective-GAN (PP-GAN) that adapts GAN with novel verificator and
regulator modules specially designed for the face de-identification problem to
ensure generating de-identified output with retained structure similarity
according to a single input. We evaluate the proposed approach in terms of
privacy protection, utility preservation, and structure similarity. Our
approach not only outperforms existing face de-identification techniques but
also provides a practical framework of adapting GAN with priors of domain
knowledge
CaseNet: Content-Adaptive Scale Interaction Networks for Scene Parsing
Objects in an image exhibit diverse scales. Adaptive receptive fields are
expected to catch suitable range of context for accurate pixel level semantic
prediction for handling objects of diverse sizes. Recently, atrous convolution
with different dilation rates has been used to generate features of
multi-scales through several branches and these features are fused for
prediction. However, there is a lack of explicit interaction among the branches
to adaptively make full use of the contexts. In this paper, we propose a
Content-Adaptive Scale Interaction Network (CaseNet) to exploit the multi-scale
features for scene parsing. We build the CaseNet based on the classic Atrous
Spatial Pyramid Pooling (ASPP) module, followed by the proposed contextual
scale interaction (CSI) module, and the scale adaptation (SA) module.
Specifically, first, for each spatial position, we enable context interaction
among different scales through scale-aware non-local operations across the
scales, \ie, CSI module, which facilitates the generation of flexible mixed
receptive fields, instead of a traditional flat one. Second, the scale
adaptation module (SA) explicitly and softly selects the suitable scale for
each spatial position and each channel. Ablation studies demonstrate the
effectiveness of the proposed modules. We achieve state-of-the-art performance
on three scene parsing benchmarks Cityscapes, ADE20K and LIP
Bridging the Gap Between Computational Photography and Visual Recognition
What is the current state-of-the-art for image restoration and enhancement
applied to degraded images acquired under less than ideal circumstances? Can
the application of such algorithms as a pre-processing step to improve image
interpretability for manual analysis or automatic visual recognition to
classify scene content? While there have been important advances in the area of
computational photography to restore or enhance the visual quality of an image,
the capabilities of such techniques have not always translated in a useful way
to visual recognition tasks. Consequently, there is a pressing need for the
development of algorithms that are designed for the joint problem of improving
visual appearance and recognition, which will be an enabling factor for the
deployment of visual recognition tools in many real-world scenarios. To address
this, we introduce the UG^2 dataset as a large-scale benchmark composed of
video imagery captured under challenging conditions, and two enhancement tasks
designed to test algorithmic impact on visual quality and automatic object
recognition. Furthermore, we propose a set of metrics to evaluate the joint
improvement of such tasks as well as individual algorithmic advances, including
a novel psychophysics-based evaluation regime for human assessment and a
realistic set of quantitative measures for object recognition performance. We
introduce six new algorithms for image restoration or enhancement, which were
created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR
2018. Under the proposed evaluation regime, we present an in-depth analysis of
these algorithms and a host of deep learning-based and classic baseline
approaches. From the observed results, it is evident that we are in the early
days of building a bridge between computational photography and visual
recognition, leaving many opportunities for innovation in this area.Comment: CVPR Prize Challenge: http://www.ug2challenge.or
Face Recognition in Low Quality Images: A Survey
Low-resolution face recognition (LRFR) has received increasing attention over
the past few years. Its applications lie widely in the real-world environment
when high-resolution or high-quality images are hard to capture. One of the
biggest demands for LRFR technologies is video surveillance. As the the number
of surveillance cameras in the city increases, the videos that captured will
need to be processed automatically. However, those videos or images are usually
captured with large standoffs, arbitrary illumination condition, and diverse
angles of view. Faces in these images are generally small in size. Several
studies addressed this problem employed techniques like super resolution,
deblurring, or learning a relationship between different resolution domains. In
this paper, we provide a comprehensive review of approaches to low-resolution
face recognition in the past five years. First, a general problem definition is
given. Later, systematically analysis of the works on this topic is presented
by catogory. In addition to describing the methods, we also focus on datasets
and experiment settings. We further address the related works on unconstrained
low-resolution face recognition and compare them with the result that use
synthetic low-resolution data. Finally, we summarized the general limitations
and speculate a priorities for the future effort.Comment: There are some mistakes addressing in this paper which will be
misleading to the reader and we wont have a new version in short time. We
will resubmit once it is being corecte
Recent Advances and Challenges in Ubiquitous Sensing
Ubiquitous sensing is tightly coupled with activity recognition. This survey
reviews recent advances in Ubiquitous sensing and looks ahead on promising
future directions. In particular, Ubiquitous sensing crosses new barriers
giving us new ways to interact with the environment or to inspect our psyche.
Through sensing paradigms that parasitically utilise stimuli from the noise of
environmental, third-party pre-installed systems, sensing leaves the boundaries
of the personal domain. Compared to previous environmental sensing approaches,
these new systems mitigate high installation and placement cost by providing a
robustness towards process noise. On the other hand, sensing focuses inward and
attempts to capture mental activities such as cognitive load, fatigue or
emotion through advances in, for instance, eye-gaze sensing systems or
interpretation of body gesture or pose. This survey summarises these
developments and discusses current research questions and promising future
directions.Comment: Submitted to PIEE
- …