308 research outputs found

    A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

    Full text link
    How do we learn an object detector that is invariant to occlusions and deformations? Our current solution is to use a data-driven strategy -- collect large-scale datasets which have object instances under different conditions. The hope is that the final classifier can use these examples to learn invariances. But is it really possible to see all the occlusions in a dataset? We argue that like categories, occlusions and object deformations also follow a long-tail. Some occlusions and deformations are so rare that they hardly happen; yet we want to learn a model invariant to such occurrences. In this paper, we propose an alternative solution. We propose to learn an adversarial network that generates examples with occlusions and deformations. The goal of the adversary is to generate examples that are difficult for the object detector to classify. In our framework both the original detector and adversary are learned in a joint manner. Our experimental results indicate a 2.3% mAP boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge compared to the Fast-RCNN pipeline. We also release the code for this paper.Comment: CVPR 2017 Camera Read

    Robustness of 3D Deep Learning in an Adversarial Setting

    Full text link
    Understanding the spatial arrangement and nature of real-world objects is of paramount importance to many complex engineering tasks, including autonomous navigation. Deep learning has revolutionized state-of-the-art performance for tasks in 3D environments; however, relatively little is known about the robustness of these approaches in an adversarial setting. The lack of comprehensive analysis makes it difficult to justify deployment of 3D deep learning models in real-world, safety-critical applications. In this work, we develop an algorithm for analysis of pointwise robustness of neural networks that operate on 3D data. We show that current approaches presented for understanding the resilience of state-of-the-art models vastly overestimate their robustness. We then use our algorithm to evaluate an array of state-of-the-art models in order to demonstrate their vulnerability to occlusion attacks. We show that, in the worst case, these networks can be reduced to 0% classification accuracy after the occlusion of at most 6.5% of the occupied input space.Comment: 10 pages, 8 figures, 1 tabl

    Practical Deep Dispersed Watermarking with Synchronization and Fusion

    Full text link
    Deep learning based blind watermarking works have gradually emerged and achieved impressive performance. However, previous deep watermarking studies mainly focus on fixed low-resolution images while paying less attention to arbitrary resolution images, especially widespread high-resolution images nowadays. Moreover, most works usually demonstrate robustness against typical non-geometric attacks (\textit{e.g.}, JPEG compression) but ignore common geometric attacks (\textit{e.g.}, Rotate) and more challenging combined attacks. To overcome the above limitations, we propose a practical deep \textbf{D}ispersed \textbf{W}atermarking with \textbf{S}ynchronization and \textbf{F}usion, called \textbf{\proposed}. Specifically, given an arbitrary-resolution cover image, we adopt a dispersed embedding scheme which sparsely and randomly selects several fixed small-size cover blocks to embed a consistent watermark message by a well-trained encoder. In the extraction stage, we first design a watermark synchronization module to locate and rectify the encoded blocks in the noised watermarked image. We then utilize a decoder to obtain messages embedded in these blocks, and propose a message fusion strategy based on similarity to make full use of the consistency among messages, thus determining a reliable message. Extensive experiments conducted on different datasets convincingly demonstrate the effectiveness of our proposed {\proposed}. Compared with state-of-the-art approaches, our blind watermarking can achieve better performance: averagely improve the bit accuracy by 5.28\% and 5.93\% against single and combined attacks, respectively, and show less file size increment and better visual quality. Our code is available at https://github.com/bytedance/DWSF.Comment: Accpeted by ACM MM 202

    Person Re-Identification by Deep Joint Learning of Multi-Loss Classification

    Full text link
    Existing person re-identification (re-id) methods rely mostly on either localised or global feature representation alone. This ignores their joint benefit and mutual complementary effects. In this work, we show the advantages of jointly learning local and global features in a Convolutional Neural Network (CNN) by aiming to discover correlated local and global features in different context. Specifically, we formulate a method for joint learning of local and global feature selection losses designed to optimise person re-id when using only generic matching metrics such as the L2 distance. We design a novel CNN architecture for Jointly Learning Multi-Loss (JLML) of local and global discriminative feature optimisation subject concurrently to the same re-id labelled information. Extensive comparative evaluations demonstrate the advantages of this new JLML model for person re-id over a wide range of state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03, Market-1501).Comment: Accepted by IJCAI 201

    3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

    Get PDF
    Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio
    • …
    corecore