1,258 research outputs found
Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices
Internet of Things(IoT) devices, mobile phones, and robotic systems are often
denied the power of deep learning algorithms due to their limited computing
power. However, to provide time-critical services such as emergency response,
home assistance, surveillance, etc, these devices often need real-time analysis
of their camera data. This paper strives to offer a viable approach to
integrate high-performance deep learning-based computer vision algorithms with
low-resource and low-power devices by leveraging the computing power of the
cloud. By offloading the computation work to the cloud, no dedicated hardware
is needed to enable deep neural networks on existing low computing power
devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the
power of using cloud computing to perform real-time vision tasks. Furthermore,
to reduce latency and improve real-time performance, compression algorithms are
proposed and evaluated for streaming real-time video frames to the cloud.Comment: Accepted to The 11th International Conference on Machine Vision (ICMV
2018). Project site: https://zhengyiluo.github.io/projects/cloudchaser
Feature-Fused SSD: Fast Detection for Small Objects
Small objects detection is a challenging task in computer vision due to its
limited resolution and information. In order to solve this problem, the
majority of existing methods sacrifice speed for improvement in accuracy. In
this paper, we aim to detect small objects at a fast speed, using the best
object detector Single Shot Multibox Detector (SSD) with respect to
accuracy-vs-speed trade-off as base architecture. We propose a multi-level
feature fusion method for introducing contextual information in SSD, in order
to improve the accuracy for small objects. In detailed fusion operation, we
design two feature fusion modules, concatenation module and element-sum module,
different in the way of adding contextual information. Experimental results
show that these two fusion modules obtain higher mAP on PASCALVOC2007 than
baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points
improvement on some smallobjects categories. The testing speed of them is 43
and 40 FPS respectively, superior to the state of the art Deconvolutional
single shot detector (DSSD) by 29.4 and 26.4 FPS. Code is available at
https://github.com/wnzhyee/Feature-Fused-SSD. Keywords: small object detection,
feature fusion, real-time, single shot multi-box detectorComment: Artificial Intelligence;8 pages,8 figure
Contextual Action Recognition with R*CNN
There are multiple cues in an image which reveal what action a person is
performing. For example, a jogger has a pose that is characteristic for
jogging, but the scene (e.g. road, trail) and the presence of other joggers can
be an additional source of information. In this work, we exploit the simple
observation that actions are accompanied by contextual cues to build a strong
action recognition system. We adapt RCNN to use more than one region for
classification while still maintaining the ability to localize the action. We
call our system R*CNN. The action-specific models and the feature maps are
trained jointly, allowing for action specific representations to emerge. R*CNN
achieves 90.2% mean AP on the PASAL VOC Action dataset, outperforming all other
approaches in the field by a significant margin. Last, we show that R*CNN is
not limited to action recognition. In particular, R*CNN can also be used to
tackle fine-grained tasks such as attribute classification. We validate this
claim by reporting state-of-the-art performance on the Berkeley Attributes of
People dataset
Non-local Neural Networks
Both convolutional and recurrent operations are building blocks that process
one local neighborhood at a time. In this paper, we present non-local
operations as a generic family of building blocks for capturing long-range
dependencies. Inspired by the classical non-local means method in computer
vision, our non-local operation computes the response at a position as a
weighted sum of the features at all positions. This building block can be
plugged into many computer vision architectures. On the task of video
classification, even without any bells and whistles, our non-local models can
compete or outperform current competition winners on both Kinetics and Charades
datasets. In static image recognition, our non-local models improve object
detection/segmentation and pose estimation on the COCO suite of tasks. Code is
available at https://github.com/facebookresearch/video-nonlocal-net .Comment: CVPR 2018, code is available at:
https://github.com/facebookresearch/video-nonlocal-ne
- …