1,343 research outputs found
Optimum Selection of DNN Model and Framework for Edge Inference
This paper describes a methodology to select the optimum combination of deep neuralnetwork and software framework for visual inference on embedded systems. As a first step, benchmarkingis required. In particular, we have benchmarked six popular network models running on four deep learningframeworks implemented on a low-cost embedded platform. Three key performance metrics have beenmeasured and compared with the resulting 24 combinations: accuracy, throughput, and power consumption.Then, application-level specifications come into play. We propose a figure of merit enabling the evaluationof each network/framework pair in terms of relative importance of the aforementioned metrics for a targetedapplication. We prove through numerical analysis and meaningful graphical representations that only areduced subset of the combinations must actually be considered for real deployment. Our approach can beextended to other networks, frameworks, and performance parameters, thus supporting system-level designdecisions in the ever-changing ecosystem of embedded deep learning technology.Ministerio de Economía y Competitividad (TEC2015-66878-C3-1-R)Junta de Andalucía (TIC 2338-2013)European Union Horizon 2020 (Grant 765866
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
A Bipartite Graph Neural Network Approach for Scalable Beamforming Optimization
Deep learning (DL) techniques have been intensively studied for the
optimization of multi-user multiple-input single-output (MU-MISO) downlink
systems owing to the capability of handling nonconvex formulations. However,
the fixed computation structure of existing deep neural networks (DNNs) lacks
flexibility with respect to the system size, i.e., the number of antennas or
users. This paper develops a bipartite graph neural network (BGNN) framework, a
scalable DL solution designed for multi-antenna beamforming optimization. The
MU-MISO system is first characterized by a bipartite graph where two disjoint
vertex sets, each of which consists of transmit antennas and users, are
connected via pairwise edges. These vertex interconnection states are modeled
by channel fading coefficients. Thus, a generic beamforming optimization
process is interpreted as a computation task over a weight bipartite graph.
This approach partitions the beamforming optimization procedure into multiple
suboperations dedicated to individual antenna vertices and user vertices.
Separated vertex operations lead to scalable beamforming calculations that are
invariant to the system size. The vertex operations are realized by a group of
DNN modules that collectively form the BGNN architecture. Identical DNNs are
reused at all antennas and users so that the resultant learning structure
becomes flexible to the network size. Component DNNs of the BGNN are trained
jointly over numerous MU-MISO configurations with randomly varying network
sizes. As a result, the trained BGNN can be universally applied to arbitrary
MU-MISO systems. Numerical results validate the advantages of the BGNN
framework over conventional methods.Comment: accepted for publication on IEEE Transactions on Wireless
Communication
BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms
As deep neural networks (DNNs) are being applied to a wide range of edge
intelligent applications, it is critical for edge inference platforms to have
both high-throughput and low-latency at the same time. Such edge platforms with
multiple DNN models pose new challenges for scheduler designs. First, each
request may have different service level objectives (SLOs) to improve quality
of service (QoS). Second, the edge platforms should be able to efficiently
schedule multiple heterogeneous DNN models so that system utilization can be
improved. To meet these two goals, this paper proposes BCEdge, a novel
learning-based scheduling framework that takes adaptive batching and concurrent
execution of DNN inference services on edge platforms. We define a utility
function to evaluate the trade-off between throughput and latency. The
scheduler in BCEdge leverages maximum entropy-based deep reinforcement learning
(DRL) to maximize utility by 1) co-optimizing batch size and 2) the number of
concurrent models automatically. Our prototype implemented on different edge
platforms shows that the proposed BCEdge enhances utility by up to 37.6% on
average, compared to state-of-the-art solutions, while satisfying SLOs
Efficient deep neural network inference for embedded systems:A mixture of experts approach
Deep neural networks (DNNs) have become one of the dominant machine learning approaches in recent years for many application domains. Unfortunately, DNNs are not well suited to addressing the challenges of embedded systems, where on-device inference on battery-powered, resource-constrained devices is often infeasible due to prohibitively long inferencing time and resource requirements. Furthermore, offloading computation into the cloud is often infeasible due to a lack of connectivity, high latency, or privacy concerns. While compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy. The key insight here is that multiple DNNs, of varying runtimes and prediction capabilities, are capable of correctly making a prediction on the same input. By choosing the fastest capable DNN for each input, the average runtime can be reduced. Furthermore, the fastest capable DNN changes depending on the evaluation criterion. This thesis presents a new, alternative approach to enable efficient execution of DNN inference on embedded devices; the aim is to reduce average DNN inferencing times without a loss in accuracy. Central to the approach is a Model Selector, which dynamically determines which DNN to use for a given input, by considering the desired evaluation metric and inference time. It employs statistical machine learning to develop a low-cost predictive model to quickly select a DNN to use for a given input and the optimisation constraint. First, the approach is shown to work effectively with off-the-self pre-trained DNNs. The approach is then extended by combining typical DNN pruning techniques with statistical machine learning in order to create a set of specialised DNNs designed specifically for use with a Model Selector. Two typical DNN application domains are used during evaluation: image classification and machine translation. Evaluation is reported on a NVIDIA Jetson TX2 embedded deep learning platform, and a range of influential DNN models including convolutional and recurrent neural networks are considered. In the first instance, utilising off-the-shelf pre-trained DNNs, a 44.45% reduction in inference time with a 7.52% improvement in accuracy, over the most-capable single DNN model, is achieved for image classification. For machine translation, inference time is reduced by 25.37% over the most-capable model with little impact on the quality of the translation. Further evaluation utilising specialised DNNs did not yield an accurate premodel and produced poor results; however analysis of a perfect premodel shows the potential for faster inference times, and reduced resource requirements over utilising off-the-shelf DNNs
Deep Learning Designs for Physical Layer Communications
Wireless communication systems and their underlying technologies have undergone unprecedented advances over the last two decades to assuage the ever-increasing demands for various applications and emerging technologies. However, the traditional signal processing schemes and algorithms for wireless communications cannot handle the upsurging complexity associated with fifth-generation (5G) and beyond communication systems due to network expansion, new emerging technologies, high data rate, and the ever-increasing demands for low latency. This thesis extends the traditional downlink transmission schemes to deep learning-based precoding and detection techniques that are hardware-efficient and of lower complexity than the current state-of-the-art. The thesis focuses on: precoding/beamforming in massive multiple-inputs-multiple-outputs (MIMO), signal detection and lightweight neural network (NN) architectures for precoder and decoder designs. We introduce a learning-based precoder design via constructive interference (CI) that performs the precoding on a symbol-by-symbol basis. Instead
of conventionally training a NN without considering the specifics of the optimisation objective, we unfold a power minimisation symbol level precoding (SLP) formulation based on the interior-point-method (IPM) proximal ‘log’ barrier function. Furthermore, we propose a concept of NN compression, where the weights are quantised to lower numerical precision formats based on binary and ternary quantisations. We further introduce a stochastic quantisation technique, where parts of the NN weight matrix are quantised while the remaining is not. Finally, we propose a systematic complexity scaling of deep neural network (DNN) based MIMO detectors. The model uses a fraction of the DNN inputs by scaling their values through weights that follow monotonically non-increasing functions. Furthermore, we investigate performance complexity tradeoffs via regularisation constraints on the layer weights such that, at inference, parts of network layers can be removed with minimal impact on the detection accuracy. Simulation results show that our proposed learning-based techniques offer better complexity-vs-BER (bit-error-rate) and complexity-vs-transmit power performances compared to the state-of-the-art MIMO detection and precoding techniques
- …