13,578 research outputs found
DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection
Although YOLOv2 approach is extremely fast on object detection; its backbone
network has the low ability on feature extraction and fails to make full use of
multi-scale local region features, which restricts the improvement of object
detection accuracy. Therefore, this paper proposed a DC-SPP-YOLO (Dense
Connection and Spatial Pyramid Pooling Based YOLO) approach for ameliorating
the object detection accuracy of YOLOv2. Specifically, the dense connection of
convolution layers is employed in the backbone network of YOLOv2 to strengthen
the feature extraction and alleviate the vanishing-gradient problem. Moreover,
an improved spatial pyramid pooling is introduced to pool and concatenate the
multi-scale local region features, so that the network can learn the object
features more comprehensively. The DC-SPP-YOLO model is established and trained
based on a new loss function composed of mean square error and cross entropy,
and the object detection is realized. Experiments demonstrate that the mAP
(mean Average Precision) of DC-SPP-YOLO proposed on PASCAL VOC datasets and
UA-DETRAC datasets is higher than that of YOLOv2; the object detection accuracy
of DC-SPP-YOLO is superior to YOLOv2 by strengthening feature extraction and
using the multi-scale local region features.Comment: 23 pages, 9 figures, 9 table
StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection
One-stage object detectors such as SSD or YOLO already have shown promising
accuracy with small memory footprint and fast speed. However, it is widely
recognized that one-stage detectors have difficulty in detecting small objects
while they are competitive with two-stage methods on large objects. In this
paper, we investigate how to alleviate this problem starting from the SSD
framework. Due to their pyramidal design, the lower layer that is responsible
for small objects lacks strong semantics(e.g contextual information). We
address this problem by introducing a feature combining module that spreads out
the strong semantics in a top-down manner. Our final model StairNet detector
unifies the multi-scale representations and semantic distribution effectively.
Experiments on PASCAL VOC 2007 and PASCAL VOC 2012 datasets demonstrate that
StairNet significantly improves the weakness of SSD and outperforms the other
state-of-the-art one-stage detectors
Automatic Image-Based Plant Disease Severity Estimation Using Deep Learning
Automatic and accurate estimation of disease severity is essential for food security, disease management, and yield loss prediction. Deep learning, the latest breakthrough in computer vision, is promising for fine-grained disease severity classification, as the method avoids the labor-intensive feature engineering and threshold-based segmentation. Using the apple black rot images in the PlantVillage dataset, which are further annotated by botanists with four severity stages as ground truth, a series of deep convolutional neural networks are trained to diagnose the severity of the disease. The performances of shallow networks trained from scratch and deep models fine-tuned by transfer learning are evaluated systemically in this paper. The best model is the deep VGG16 model trained with transfer learning, which yields an overall accuracy of 90.4% on the hold-out test set. The proposed deep learning model may have great potential in disease control for modern agriculture
Object Counting with Deep Learning
This thesis explores various empirical aspects of deep learning or convolutional network based models for efficient object counting. First, we train moderately large convolutional networks on comparatively smaller datasets containing few hundred samples from scratch with conventional image processing based data augmentation. Then, we extend this approach for unconstrained, outdoor images using more advanced architectural concepts. Additionally, we propose an efficient, randomized data augmentation strategy based on sub-regional pixel distribution for low-resolution images.
Next, the effectiveness of depth-to-space shuffling of feature elements for efficient segmentation is investigated for simpler problems like binary segmentation -- often required in the counting framework. This depth-to-space operation violates the basic assumption of encoder-decoder type of segmentation architectures. Consequently, it helps to train the encoder model as a sparsely connected graph. Nonetheless, we have found comparable accuracy to that of the standard encoder-decoder architectures with our depth-to-space models.
After that, the subtleties regarding the lack of localization information in the conventional scalar count loss for one-look models are illustrated. At this point, without using additional annotations, a possible solution is proposed based on the regulation of a network-generated heatmap in the form of a weak, subsidiary loss. The models trained with this auxiliary loss alongside the conventional loss perform much better compared to their baseline counterparts, both qualitatively and quantitatively. Lastly, the intricacies of tiled prediction for high-resolution images are studied in detail, and a simple and effective trick of eliminating the normalization factor in an existing computational block is demonstrated. All of the approaches employed here are thoroughly benchmarked across multiple heterogeneous datasets for object counting against previous, state-of-the-art approaches
- …