9,946 research outputs found
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
We present an integrated framework for using Convolutional Networks for
classification, localization and detection. We show how a multiscale and
sliding window approach can be efficiently implemented within a ConvNet. We
also introduce a novel deep learning approach to localization by learning to
predict object boundaries. Bounding boxes are then accumulated rather than
suppressed in order to increase detection confidence. We show that different
tasks can be learned simultaneously using a single shared network. This
integrated framework is the winner of the localization task of the ImageNet
Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very
competitive results for the detection and classifications tasks. In
post-competition work, we establish a new state of the art for the detection
task. Finally, we release a feature extractor from our best model called
OverFeat
Real-Time Grasp Detection Using Convolutional Neural Networks
We present an accurate, real-time approach to robotic grasp detection based
on convolutional neural networks. Our network performs single-stage regression
to graspable bounding boxes without using standard sliding window or region
proposal techniques. The model outperforms state-of-the-art approaches by 14
percentage points and runs at 13 frames per second on a GPU. Our network can
simultaneously perform classification so that in a single step it recognizes
the object and finds a good grasp rectangle. A modification to this model
predicts multiple grasps per object by using a locally constrained prediction
mechanism. The locally constrained model performs significantly better,
especially on objects that can be grasped in a variety of ways.Comment: Accepted to ICRA 201
IOD-CNN: Integrating Object Detection Networks for Event Recognition
Many previous methods have showed the importance of considering semantically
relevant objects for performing event recognition, yet none of the methods have
exploited the power of deep convolutional neural networks to directly integrate
relevant object information into a unified network. We present a novel unified
deep CNN architecture which integrates architecturally different, yet
semantically-related object detection networks to enhance the performance of
the event recognition task. Our architecture allows the sharing of the
convolutional layers and a fully connected layer which effectively integrates
event recognition, rigid object detection and non-rigid object detection.Comment: submitted to IEEE International Conference on Image Processing 201
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Existing deep convolutional neural networks (CNNs) require a fixed-size
(e.g., 224x224) input image. This requirement is "artificial" and may reduce
the recognition accuracy for the images or sub-images of an arbitrary
size/scale. In this work, we equip the networks with another pooling strategy,
"spatial pyramid pooling", to eliminate the above requirement. The new network
structure, called SPP-net, can generate a fixed-length representation
regardless of image size/scale. Pyramid pooling is also robust to object
deformations. With these advantages, SPP-net should in general improve all
CNN-based image classification methods. On the ImageNet 2012 dataset, we
demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures
despite their different designs. On the Pascal VOC 2007 and Caltech101
datasets, SPP-net achieves state-of-the-art classification results using a
single full-image representation and no fine-tuning.
The power of SPP-net is also significant in object detection. Using SPP-net,
we compute the feature maps from the entire image only once, and then pool
features in arbitrary regions (sub-images) to generate fixed-length
representations for training the detectors. This method avoids repeatedly
computing the convolutional features. In processing test images, our method is
24-102x faster than the R-CNN method, while achieving better or comparable
accuracy on Pascal VOC 2007.
In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our
methods rank #2 in object detection and #3 in image classification among all 38
teams. This manuscript also introduces the improvement made for this
competition.Comment: This manuscript is the accepted version for IEEE Transactions on
Pattern Analysis and Machine Intelligence (TPAMI) 2015. See Changelo
- …