71 research outputs found

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

    Full text link
    State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.Comment: Extended tech repor

    SSD: Single Shot MultiBox Detector

    Full text link
    We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300×300300\times 300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500×500500\times 500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd .Comment: ECCV 201

    LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery

    Get PDF
    State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery. We enhance translation invariance to detect dense vehicles and address the boundary quantization issue amongst dense vehicles by aggregating the high-precision RoIs' features. Moreover, we resample high-level semantic pooled features, making them regain location information from the features of a shallower convolutional block. This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation. The local feature invariance enhances the learning ability of the focal loss function, and the focal loss further helps to focus on the hard examples. Taken together, our method better addresses the challenges of aerial imagery. We evaluate our approach on several challenging datasets (VEDAI, DOTA), demonstrating a significant improvement over state-of-the-art methods. We demonstrate the good generalization ability of our approach on the DLR 3K dataset.Comment: 8 page

    A New Indonesian Traffic Obstacle Dataset and Performance Evaluation of YOLOv4 for ADAS

    Get PDF
    Intelligent transport systems (ITS) are a promising area of studies. One implementation of ITS are advanced driver assistance systems (ADAS), involving the problem of obstacle detection in traffic. This study evaluated the YOLOv4 model as a state-of-the-art CNN-based one-stage detector to recognize traffic obstacles. A new dataset is proposed containing traffic obstacles on Indonesian roads for ADAS to detect traffic obstacles that are unique to Indonesia, such as pedicabs, street vendors, and bus shelters, and are not included in existing datasets. This study established a traffic obstacle dataset containing eleven object classes: cars, buses, trucks, bicycles, motorcycles, pedestrians, pedicabs, trees, bus shelters, traffic signs, and street vendors, with 26,016 labeled instances in 7,789 images. A performance analysis of traffic obstacle detection on Indonesian roads using the dataset created in this study was conducted using the YOLOv4 method

    A DEEP LEARNING APPROACH FOR AIRPORT RUNWAY IDENTIFICATION FROM SATELLITE IMAGERY

    Get PDF
    The United States lacks a comprehensive national database of private Prior Permission Required (PPR) airports. The primary reason such a database does not exist is that there are no federal regulatory obligations for these facilities to have their information re-evaluated or updated by the Federal Aviation Administration (FAA) or the local state Department of Transportation (DOT) once the data has been entered into the system. The often outdated and incorrect information about landing sites presents a serious risk factor in aviation safety. In this thesis, we present a machine learning approach for detecting airport landing sites from Google Earth satellite imagery. The approach presented in this thesis plays a crucial role in confirming the FAA\u27s current database and improving aviation safety in the United States. Specifically, we designed, implemented, and evaluated object detection and segmentation techniques for identifying and segmenting the regions of interest in image data. The in-house dataset has been thoroughly annotated that includes 400 satellite images with a total of 700 instances of runways. The images - acquired via Google Maps static API - are 3000x3000 pixels in size. The models were trained using two distinct backbones on a Mask R-CNN architecture: ResNet101, and ResneXt101, and obtained the highest average precision score @0.75 with ResNet-101 at 92% and recall at 89%. We finally hosted the model in the StreamLit front-end platform, allowing users to enter any location to check and confirm the presence of a runway
    • …
    corecore