308 research outputs found
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
How do we learn an object detector that is invariant to occlusions and
deformations? Our current solution is to use a data-driven strategy -- collect
large-scale datasets which have object instances under different conditions.
The hope is that the final classifier can use these examples to learn
invariances. But is it really possible to see all the occlusions in a dataset?
We argue that like categories, occlusions and object deformations also follow a
long-tail. Some occlusions and deformations are so rare that they hardly
happen; yet we want to learn a model invariant to such occurrences. In this
paper, we propose an alternative solution. We propose to learn an adversarial
network that generates examples with occlusions and deformations. The goal of
the adversary is to generate examples that are difficult for the object
detector to classify. In our framework both the original detector and adversary
are learned in a joint manner. Our experimental results indicate a 2.3% mAP
boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge
compared to the Fast-RCNN pipeline. We also release the code for this paper.Comment: CVPR 2017 Camera Read
Robustness of 3D Deep Learning in an Adversarial Setting
Understanding the spatial arrangement and nature of real-world objects is of
paramount importance to many complex engineering tasks, including autonomous
navigation. Deep learning has revolutionized state-of-the-art performance for
tasks in 3D environments; however, relatively little is known about the
robustness of these approaches in an adversarial setting. The lack of
comprehensive analysis makes it difficult to justify deployment of 3D deep
learning models in real-world, safety-critical applications. In this work, we
develop an algorithm for analysis of pointwise robustness of neural networks
that operate on 3D data. We show that current approaches presented for
understanding the resilience of state-of-the-art models vastly overestimate
their robustness. We then use our algorithm to evaluate an array of
state-of-the-art models in order to demonstrate their vulnerability to
occlusion attacks. We show that, in the worst case, these networks can be
reduced to 0% classification accuracy after the occlusion of at most 6.5% of
the occupied input space.Comment: 10 pages, 8 figures, 1 tabl
Practical Deep Dispersed Watermarking with Synchronization and Fusion
Deep learning based blind watermarking works have gradually emerged and
achieved impressive performance. However, previous deep watermarking studies
mainly focus on fixed low-resolution images while paying less attention to
arbitrary resolution images, especially widespread high-resolution images
nowadays. Moreover, most works usually demonstrate robustness against typical
non-geometric attacks (\textit{e.g.}, JPEG compression) but ignore common
geometric attacks (\textit{e.g.}, Rotate) and more challenging combined
attacks. To overcome the above limitations, we propose a practical deep
\textbf{D}ispersed \textbf{W}atermarking with \textbf{S}ynchronization and
\textbf{F}usion, called \textbf{\proposed}. Specifically, given an
arbitrary-resolution cover image, we adopt a dispersed embedding scheme which
sparsely and randomly selects several fixed small-size cover blocks to embed a
consistent watermark message by a well-trained encoder. In the extraction
stage, we first design a watermark synchronization module to locate and rectify
the encoded blocks in the noised watermarked image. We then utilize a decoder
to obtain messages embedded in these blocks, and propose a message fusion
strategy based on similarity to make full use of the consistency among
messages, thus determining a reliable message. Extensive experiments conducted
on different datasets convincingly demonstrate the effectiveness of our
proposed {\proposed}. Compared with state-of-the-art approaches, our blind
watermarking can achieve better performance: averagely improve the bit accuracy
by 5.28\% and 5.93\% against single and combined attacks, respectively, and
show less file size increment and better visual quality. Our code is available
at https://github.com/bytedance/DWSF.Comment: Accpeted by ACM MM 202
Person Re-Identification by Deep Joint Learning of Multi-Loss Classification
Existing person re-identification (re-id) methods rely mostly on either
localised or global feature representation alone. This ignores their joint
benefit and mutual complementary effects. In this work, we show the advantages
of jointly learning local and global features in a Convolutional Neural Network
(CNN) by aiming to discover correlated local and global features in different
context. Specifically, we formulate a method for joint learning of local and
global feature selection losses designed to optimise person re-id when using
only generic matching metrics such as the L2 distance. We design a novel CNN
architecture for Jointly Learning Multi-Loss (JLML) of local and global
discriminative feature optimisation subject concurrently to the same re-id
labelled information. Extensive comparative evaluations demonstrate the
advantages of this new JLML model for person re-id over a wide range of
state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03,
Market-1501).Comment: Accepted by IJCAI 201
3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization
Although deep-learning based methods for monocular pedestrian detection have
made great progress, they are still vulnerable to heavy occlusions. Using
multi-view information fusion is a potential solution but has limited
applications, due to the lack of annotated training samples in existing
multi-view datasets, which increases the risk of overfitting. To address this
problem, a data augmentation method is proposed to randomly generate 3D
cylinder occlusions, on the ground plane, which are of the average size of
pedestrians and projected to multiple views, to relieve the impact of
overfitting in the training. Moreover, the feature map of each view is
projected to multiple parallel planes at different heights, by using
homographies, which allows the CNNs to fully utilize the features across the
height of each pedestrian to infer the locations of pedestrians on the ground
plane. The proposed 3DROM method has a greatly improved performance in
comparison with the state-of-the-art deep-learning based methods for multi-view
pedestrian detection
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
- …