Search CORE

166 research outputs found

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

Author: Cai Zhaowei
Fan Quanfu
Feris Rogerio S.
Vasconcelos Nuno
Publication venue
Publication date: 25/07/2016
Field of study

A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects

arXiv.org e-Print Archive

VANETs Meet Autonomous Vehicles: A Multimodal 3D Environment Learning Approach

Author: Abdel-Rahim Ahmed
Guizani Mohsen
Maalej Yassine
Sorour Sameh
Publication venue
Publication date: 24/05/2017
Field of study

In this paper, we design a multimodal framework for object detection, recognition and mapping based on the fusion of stereo camera frames, point cloud Velodyne Lidar scans, and Vehicle-to-Vehicle (V2V) Basic Safety Messages (BSMs) exchanged using Dedicated Short Range Communication (DSRC). We merge the key features of rich texture descriptions of objects from 2D images, depth and distance between objects provided by 3D point cloud and awareness of hidden vehicles from BSMs' 3D information. We present a joint pixel to point cloud and pixel to V2V correspondences of objects in frames from the Kitti Vision Benchmark Suite by using a semi-supervised manifold alignment approach to achieve camera-Lidar and camera-V2V mapping of their recognized objects that have the same underlying manifold.Comment: 7 pages, 12 figure

arXiv.org e-Print Archive

Accurate Face Detection for High Performance

Author: Ai Guo
Fan Xinyu
Qin Yongqiang
Song Jianfei
Wu Jiahong
Zhang Faen
Publication venue
Publication date: 24/05/2019
Field of study

Face detection has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs). Its central issue in recent years is how to improve the detection performance of tiny faces. To this end, many recent works propose some specific strategies, redesign the architecture and introduce new loss functions for tiny object detection. In this report, we start from the popular one-stage RetinaNet approach and apply some recent tricks to obtain a high performance face detector. Specifically, we apply the Intersection over Union (IoU) loss function for regression, employ the two-step classification and regression for detection, revisit the data augmentation based on data-anchor-sampling for training, utilize the max-out operation for classification and use the multi-scale testing strategy for inference. As a consequence, the proposed face detection method achieves state-of-the-art performance on the most popular and challenging face detection benchmark WIDER FACE dataset.Comment: 9 pages, 3 figures, technical repor

arXiv.org e-Print Archive

Improved Selective Refinement Network for Face Detection

Author: Fu Tianyu
Li Stan Z.
Mei Tao
Shi Hailin
Wang Shuo
Wang Xiaobo
Zhang Shifeng
Zhu Rui
Publication venue
Publication date: 29/01/2019
Field of study

As a long-standing problem in computer vision, face detection has attracted much attention in recent decades for its practical applications. With the availability of face detection benchmark WIDER FACE dataset, much of the progresses have been made by various algorithms in recent years. Among them, the Selective Refinement Network (SRN) face detector introduces the two-step classification and regression operations selectively into an anchor-based face detector to reduce false positives and improve location accuracy simultaneously. Moreover, it designs a receptive field enhancement block to provide more diverse receptive field. In this report, to further improve the performance of SRN, we exploit some existing techniques via extensive experiments, including new data augmentation strategy, improved backbone network, MS COCO pretraining, decoupled classification module, segmentation branch and Squeeze-and-Excitation block. Some of these techniques bring performance improvements, while few of them do not well adapt to our baseline. As a consequence, we present an improved SRN face detector by combining these useful techniques together and obtain the best performance on widely used face detection benchmark WIDER FACE dataset.Comment: Technical report, 8 pages, 6 figure

arXiv.org e-Print Archive

3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results

Author: Fang Jiaojiao
Liu Guizhong
Zhou Lingtao
Publication venue
Publication date: 01/09/2019
Field of study

3D object detection is one of the most important tasks in 3D vision perceptual system of autonomous vehicles. In this paper, we propose a novel two stage 3D object detection method aimed at get the optimal solution of object location in 3D space based on regressing two additional 3D object properties by a deep convolutional neural network and combined with cascaded geometric constraints between the 2D and 3D boxes. First, we modify the existing 3D properties regressing network by adding two additional components, viewpoints classification and the center projection of the 3D bounding box s bottom face. Second, we use the predicted center projection combined with similar triangle constraint to acquire an initial 3D bounding box by a closed-form solution. Then, the location predicted by previous step is used as the initial value of the over-determined equations constructed by 2D and 3D boxes fitting constraint with the configuration determined with the classified viewpoint. Finally, we use the recovered physical world information by the 3D detections to filter out the false detection and false alarm in 2D detections. We compare our method with the state-of-the-arts on the KITTI dataset show that although conceptually simple, our method outperforms more complex and computational expensive methods not only by improving the overall precision of 3D detections, but also increasing the orientation estimation precision. Furthermore our method can deal with the truncated objects to some extent and remove the false alarm and false detections in both 2D and 3D detections

arXiv.org e-Print Archive

Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks

Author: Cai Cheng-hao
Ding Chang-qing
Li Wen-bin
Xie Jiang-jian
Publication venue
Publication date: 03/03/2018
Field of study

Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions(MAP). To improve the MAP of the proposed model, we investigate the result fusion mode to form multi-channel identification model, the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased.Comment: 11 pages,11 figure

arXiv.org e-Print Archive

Object Detection in Specific Traffic Scenes using YOLOv2

Author: Tang Weitao
Wang Shouyu
Publication venue
Publication date: 12/05/2019
Field of study

object detection framework plays crucial role in autonomous driving. In this paper, we introduce the real-time object detection framework called You Only Look Once (YOLOv1) and the related improvements of YOLOv2. We further explore the capability of YOLOv2 by implementing its pre-trained model to do the object detecting tasks in some specific traffic scenes. The four artificially designed traffic scenes include single-car, single-person, frontperson-rearcar and frontcar-rearperson

arXiv.org e-Print Archive

Improving Object Detection from Scratch via Gated Feature Reuse

Author: Cao Liangliang
Feris Rogerio
Huang Thomas
Liu Ding
Phan Hai
Savvides Marios
Shen Zhiqiang
Shi Honghui
Wang Xinchao
Yu Jiahui
Publication venue
Publication date: 07/07/2019
Field of study

In this paper, we present a simple and parameter-efficient drop-in module for one-stage object detectors like SSD when learning from scratch (i.e., without pre-trained models). We call our module GFR (Gated Feature Reuse), which exhibits two main advantages. First, we introduce a novel gate-controlled prediction strategy enabled by Squeeze-and-Excitation to adaptively enhance or attenuate supervision at different scales based on the input object size. As a result, our model is more effective in detecting diverse sizes of objects. Second, we propose a feature-pyramids structure to squeeze rich spatial and semantic features into a single prediction layer, which strengthens feature representation and reduces the number of parameters to learn. We apply the proposed structure on DSOD and SSD detection frameworks, and evaluate the performance on PASCAL VOC 2007, 2012 and COCO datasets. With fewer model parameters, GFR-DSOD outperforms the baseline DSOD by 1.4%, 1.1%, 1.7% and 0.6%, respectively. GFR-SSD also outperforms the original SSD and SSD with dense prediction by 3.6% and 2.8% on VOC 2007 dataset. Code is available at: https://github.com/szq0214/GFR-DSOD .Comment: Accepted in BMVC 2019. Code: https://github.com/szq0214/GFR-DSO

arXiv.org e-Print Archive

Domain Adaptation from Synthesis to Reality in Single-model Detector for Video Smoke Detection

Author: Lin Gaohua
Wang Jinjun
Xu Gao
Zhang Qixing
Zhang Yongming
Publication venue
Publication date: 20/11/2018
Field of study

This paper proposes a method for video smoke detection using synthetic smoke samples. The virtual data can automatically offer precise and rich annotated samples. However, the learning of smoke representations will be hurt by the appearance gap between real and synthetic smoke samples. The existed researches mainly work on the adaptation to samples extracted from original annotated samples. These methods take the object detection and domain adaptation as two independent parts. To train a strong detector with rich synthetic samples, we construct the adaptation to the detection layer of state-of-the-art single-model detectors (SSD and MS-CNN). The training procedure is an end-to-end stage. The classification, location and adaptation are combined in the learning. The performance of the proposed model surpasses the original baseline in our experiments. Meanwhile, our results show that the detectors based on the adversarial adaptation are superior to the detectors based on the discrepancy adaptation. Code will be made publicly available on http://smoke.ustc.edu.cn. Moreover, the domain adaptation for two-stage detector is described in Appendix A.Comment: The manuscript approved by all authors is our original work, and has submitted to Pattern Recognition for peer review previously. There are 4532 words, 6 figures and 1 table in this manuscrip

arXiv.org e-Print Archive

CrowdHuman: A Benchmark for Detecting Human in a Crowd

Author: Li Boxun
Shao Shuai
Sun Jian
Xiao Tete
Yu Gang
Zhang Xiangyu
Zhao Zijian
Publication venue
Publication date: 30/04/2018
Field of study

Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human in highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented in current human detection benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. There are a total of

470K

human instances from the train and validation subsets, and

~22.6

persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. Baseline performance of state-of-the-art detection frameworks on CrowdHuman is presented. The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks

arXiv.org e-Print Archive