3,338 research outputs found
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
Object detection is considered one of the most challenging problems in this
field of computer vision, as it involves the combination of object
classification and object localization within a scene. Recently, deep neural
networks (DNNs) have been demonstrated to achieve superior object detection
performance compared to other approaches, with YOLOv2 (an improved You Only
Look Once model) being one of the state-of-the-art in DNN-based object
detection methods in terms of both speed and accuracy. Although YOLOv2 can
achieve real-time performance on a powerful GPU, it still remains very
challenging for leveraging this approach for real-time object detection in
video on embedded computing devices with limited computational power and
limited memory. In this paper, we propose a new framework called Fast YOLO, a
fast You Only Look Once framework which accelerates YOLOv2 to be able to
perform object detection in video on embedded devices in a real-time manner.
First, we leverage the evolutionary deep intelligence framework to evolve the
YOLOv2 network architecture and produce an optimized architecture (referred to
as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To
further reduce power consumption on embedded devices while maintaining
performance, a motion-adaptive inference method is introduced into the proposed
Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2
based on temporal motion characteristics. Experimental results show that the
proposed Fast YOLO framework can reduce the number of deep inferences by an
average of 38.13%, and an average speedup of ~3.3X for objection detection in
video compared to the original YOLOv2, leading Fast YOLO to run an average of
~18FPS on a Nvidia Jetson TX1 embedded system
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.Comment: 10 page
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments
Deep neural networks (DNNs) have become core computation components within
low latency Function as a Service (FaaS) prediction pipelines: including image
recognition, object detection, natural language processing, speech synthesis,
and personalized recommendation pipelines. Cloud computing, as the de-facto
backbone of modern computing infrastructure for both enterprise and consumer
applications, has to be able to handle user-defined pipelines of diverse DNN
inference workloads while maintaining isolation and latency guarantees, and
minimizing resource waste. The current solution for guaranteeing isolation
within FaaS is suboptimal -- suffering from "cold start" latency. A major cause
of such inefficiency is the need to move large amount of model data within and
across servers. We propose TrIMS as a novel solution to address these issues.
Our proposed solution consists of a persistent model store across the GPU, CPU,
local storage, and cloud storage hierarchy, an efficient resource management
layer that provides isolation, and a succinct set of application APIs and
container technologies for easy and transparent integration with FaaS, Deep
Learning (DL) frameworks, and user code. We demonstrate our solution by
interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x
speedup in latency for image classification models and up to 210x speedup for
large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201
- …