303 research outputs found
DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection
Although YOLOv2 approach is extremely fast on object detection; its backbone
network has the low ability on feature extraction and fails to make full use of
multi-scale local region features, which restricts the improvement of object
detection accuracy. Therefore, this paper proposed a DC-SPP-YOLO (Dense
Connection and Spatial Pyramid Pooling Based YOLO) approach for ameliorating
the object detection accuracy of YOLOv2. Specifically, the dense connection of
convolution layers is employed in the backbone network of YOLOv2 to strengthen
the feature extraction and alleviate the vanishing-gradient problem. Moreover,
an improved spatial pyramid pooling is introduced to pool and concatenate the
multi-scale local region features, so that the network can learn the object
features more comprehensively. The DC-SPP-YOLO model is established and trained
based on a new loss function composed of mean square error and cross entropy,
and the object detection is realized. Experiments demonstrate that the mAP
(mean Average Precision) of DC-SPP-YOLO proposed on PASCAL VOC datasets and
UA-DETRAC datasets is higher than that of YOLOv2; the object detection accuracy
of DC-SPP-YOLO is superior to YOLOv2 by strengthening feature extraction and
using the multi-scale local region features.Comment: 23 pages, 9 figures, 9 table
Feature-Fused SSD: Fast Detection for Small Objects
Small objects detection is a challenging task in computer vision due to its
limited resolution and information. In order to solve this problem, the
majority of existing methods sacrifice speed for improvement in accuracy. In
this paper, we aim to detect small objects at a fast speed, using the best
object detector Single Shot Multibox Detector (SSD) with respect to
accuracy-vs-speed trade-off as base architecture. We propose a multi-level
feature fusion method for introducing contextual information in SSD, in order
to improve the accuracy for small objects. In detailed fusion operation, we
design two feature fusion modules, concatenation module and element-sum module,
different in the way of adding contextual information. Experimental results
show that these two fusion modules obtain higher mAP on PASCALVOC2007 than
baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points
improvement on some smallobjects categories. The testing speed of them is 43
and 40 FPS respectively, superior to the state of the art Deconvolutional
single shot detector (DSSD) by 29.4 and 26.4 FPS. Code is available at
https://github.com/wnzhyee/Feature-Fused-SSD. Keywords: small object detection,
feature fusion, real-time, single shot multi-box detectorComment: Artificial Intelligence;8 pages,8 figure
Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network
Single Shot MultiBox Detector (SSD) is one of the fastest algorithms in the
current object detection field, which uses fully convolutional neural network
to detect all scaled objects in an image. Deconvolutional Single Shot Detector
(DSSD) is an approach which introduces more context information by adding the
deconvolution module to SSD. And the mean Average Precision (mAP) of DSSD on
PASCAL VOC2007 is improved from SSD's 77.5% to 78.6%. Although DSSD obtains
higher mAP than SSD by 1.1%, the frames per second (FPS) decreases from 46 to
11.8. In this paper, we propose a single stage end-to-end image detection model
called ESSD to overcome this dilemma. Our solution to this problem is to
cleverly extend better context information for the shallow layers of the best
single stage (e.g. SSD) detectors. Experimental results show that our model can
reach 79.4% mAP, which is higher than DSSD and SSD by 0.8 and 1.9 points
respectively. Meanwhile, our testing speed is 25 FPS in Titan X GPU which is
more than double the original DSSD.Comment: 7 pages, 3 figures, 3 table
StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection
One-stage object detectors such as SSD or YOLO already have shown promising
accuracy with small memory footprint and fast speed. However, it is widely
recognized that one-stage detectors have difficulty in detecting small objects
while they are competitive with two-stage methods on large objects. In this
paper, we investigate how to alleviate this problem starting from the SSD
framework. Due to their pyramidal design, the lower layer that is responsible
for small objects lacks strong semantics(e.g contextual information). We
address this problem by introducing a feature combining module that spreads out
the strong semantics in a top-down manner. Our final model StairNet detector
unifies the multi-scale representations and semantic distribution effectively.
Experiments on PASCAL VOC 2007 and PASCAL VOC 2012 datasets demonstrate that
StairNet significantly improves the weakness of SSD and outperforms the other
state-of-the-art one-stage detectors
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather
The fusion of multimodal sensor streams, such as camera, lidar, and radar
measurements, plays a critical role in object detection for autonomous
vehicles, which base their decision making on these inputs. While existing
methods exploit redundant information in good environmental conditions, they
fail in adverse weather where the sensory streams can be asymmetrically
distorted. These rare "edge-case" scenarios are not represented in available
datasets, and existing fusion architectures are not designed to handle them. To
address this challenge we present a novel multimodal dataset acquired in over
10,000km of driving in northern Europe. Although this dataset is the first
large multimodal dataset in adverse weather, with 100k labels for lidar,
camera, radar, and gated NIR sensors, it does not facilitate training as
extreme weather is rare. To this end, we present a deep fusion network for
robust fusion without a large corpus of labeled training data covering all
asymmetric distortions. Departing from proposal-level fusion, we propose a
single-shot model that adaptively fuses features, driven by measurement
entropy. We validate the proposed method, trained on clean data, on our
extensive validation dataset. Code and data are available here
https://github.com/princeton-computational-imaging/SeeingThroughFog
- …