34,180 research outputs found
FFT-Based Deep Learning Deployment in Embedded Systems
Deep learning has delivered its powerfulness in many application domains,
especially in image and speech recognition. As the backbone of deep learning,
deep neural networks (DNNs) consist of multiple layers of various types with
hundreds to thousands of neurons. Embedded platforms are now becoming essential
for deep learning deployment due to their portability, versatility, and energy
efficiency. The large model size of DNNs, while providing excellent accuracy,
also burdens the embedded platforms with intensive computation and storage.
Researchers have investigated on reducing DNN model size with negligible
accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN
training and inference model suitable for embedded platforms with reduced
asymptotic complexity of both computation and storage, making our approach
distinguished from existing approaches. We develop the training and inference
algorithms based on FFT as the computing kernel and deploy the FFT-based
inference model on embedded platforms achieving extraordinary processing speed.Comment: Design, Automation, and Test in Europe (DATE) For source code, please
contact Mahdi Nazemi at <[email protected]
Communication Theoretic Data Analytics
Widespread use of the Internet and social networks invokes the generation of
big data, which is proving to be useful in a number of applications. To deal
with explosively growing amounts of data, data analytics has emerged as a
critical technology related to computing, signal processing, and information
networking. In this paper, a formalism is considered in which data is modeled
as a generalized social network and communication theory and information theory
are thereby extended to data analytics. First, the creation of an equalizer to
optimize information transfer between two data variables is considered, and
financial data is used to demonstrate the advantages. Then, an information
coupling approach based on information geometry is applied for dimensionality
reduction, with a pattern recognition example to illustrate the effectiveness.
These initial trials suggest the potential of communication theoretic data
analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan.
201
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
Object detection is considered one of the most challenging problems in this
field of computer vision, as it involves the combination of object
classification and object localization within a scene. Recently, deep neural
networks (DNNs) have been demonstrated to achieve superior object detection
performance compared to other approaches, with YOLOv2 (an improved You Only
Look Once model) being one of the state-of-the-art in DNN-based object
detection methods in terms of both speed and accuracy. Although YOLOv2 can
achieve real-time performance on a powerful GPU, it still remains very
challenging for leveraging this approach for real-time object detection in
video on embedded computing devices with limited computational power and
limited memory. In this paper, we propose a new framework called Fast YOLO, a
fast You Only Look Once framework which accelerates YOLOv2 to be able to
perform object detection in video on embedded devices in a real-time manner.
First, we leverage the evolutionary deep intelligence framework to evolve the
YOLOv2 network architecture and produce an optimized architecture (referred to
as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To
further reduce power consumption on embedded devices while maintaining
performance, a motion-adaptive inference method is introduced into the proposed
Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2
based on temporal motion characteristics. Experimental results show that the
proposed Fast YOLO framework can reduce the number of deep inferences by an
average of 38.13%, and an average speedup of ~3.3X for objection detection in
video compared to the original YOLOv2, leading Fast YOLO to run an average of
~18FPS on a Nvidia Jetson TX1 embedded system
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA
Memory-augmented neural networks (MANNs) are designed for question-answering
tasks. It is difficult to run a MANN effectively on accelerators designed for
other neural networks (NNs), in particular on mobile devices, because MANNs
require recurrent data paths and various types of operations related to
external memory access. We implement an accelerator for MANNs on a
field-programmable gate array (FPGA) based on a data flow architecture.
Inference times are also reduced by inference thresholding, which is a
data-based maximum inner-product search specialized for natural language tasks.
Measurements on the bAbI data show that the energy efficiency of the
accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a
factor of about 125, increasing to 140 with inference thresholdingComment: Accepted to DATE 201
- …