5,575 research outputs found
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
Along with the rapid developments in communication technologies and the surge
in the use of mobile devices, a brand-new computation paradigm, Edge Computing,
is surging in popularity. Meanwhile, Artificial Intelligence (AI) applications
are thriving with the breakthroughs in deep learning and the many improvements
in hardware architectures. Billions of data bytes, generated at the network
edge, put massive demands on data processing and structural optimization. Thus,
there exists a strong demand to integrate Edge Computing and AI, which gives
birth to Edge Intelligence. In this paper, we divide Edge Intelligence into AI
for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial
Intelligence on Edge). The former focuses on providing more optimal solutions
to key problems in Edge Computing with the help of popular and effective AI
technologies while the latter studies how to carry out the entire process of
building AI models, i.e., model training and inference, on the edge. This paper
provides insights into this new inter-disciplinary field from a broader
perspective. It discusses the core concepts and the research road-map, which
should provide the necessary background for potential future research
initiatives in Edge Intelligence.Comment: 13 pages, 3 figure
Machine Learning for Heterogeneous Ultra-Dense Networks with Graphical Representations
Heterogeneous ultra-dense network (H-UDN) is envisioned as a promising
solution to sustain the explosive mobile traffic demand through network
densification. By placing access points, processors, and storage units as close
as possible to mobile users, H-UDNs bring forth a number of advantages,
including high spectral efficiency, high energy efficiency, and low latency.
Nonetheless, the high density and diversity of network entities in H-UDNs
introduce formidable design challenges in collaborative signal processing and
resource management. This article illustrates the great potential of machine
learning techniques in solving these challenges. In particular, we show how to
utilize graphical representations of H-UDNs to design efficient machine
learning algorithms
Application of Machine Learning in Wireless Networks: Key Techniques and Open Issues
As a key technique for enabling artificial intelligence, machine learning
(ML) is capable of solving complex problems without explicit programming.
Motivated by its successful applications to many practical tasks like image
recognition, both industry and the research community have advocated the
applications of ML in wireless communication. This paper comprehensively
surveys the recent advances of the applications of ML in wireless
communication, which are classified as: resource management in the MAC layer,
networking and mobility management in the network layer, and localization in
the application layer. The applications in resource management further include
power control, spectrum management, backhaul management, cache management,
beamformer design and computation resource management, while ML based
networking focuses on the applications in clustering, base station switching
control, user association and routing. Moreover, literatures in each aspect is
organized according to the adopted ML techniques. In addition, several
conditions for applying ML to wireless communication are identified to help
readers decide whether to use ML and which kind of ML techniques to use, and
traditional approaches are also summarized together with their performance
comparison with ML based approaches, based on which the motivations of surveyed
literatures to adopt ML are clarified. Given the extensiveness of the research
area, challenges and unresolved issues are presented to facilitate future
studies, where ML based network slicing, infrastructure update to support ML
based paradigms, open data sets and platforms for researchers, theoretical
guidance for ML implementation and so on are discussed.Comment: 34 pages,8 figure
TensorFlow: A system for large-scale machine learning
TensorFlow is a machine learning system that operates at large scale and in
heterogeneous environments. TensorFlow uses dataflow graphs to represent
computation, shared state, and the operations that mutate that state. It maps
the nodes of a dataflow graph across many machines in a cluster, and within a
machine across multiple computational devices, including multicore CPUs,
general-purpose GPUs, and custom designed ASICs known as Tensor Processing
Units (TPUs). This architecture gives flexibility to the application developer:
whereas in previous "parameter server" designs the management of shared state
is built into the system, TensorFlow enables developers to experiment with
novel optimizations and training algorithms. TensorFlow supports a variety of
applications, with particularly strong support for training and inference on
deep neural networks. Several Google services use TensorFlow in production, we
have released it as an open-source project, and it has become widely used for
machine learning research. In this paper, we describe the TensorFlow dataflow
model in contrast to existing systems, and demonstrate the compelling
performance that TensorFlow achieves for several real-world applications.Comment: 18 pages, 9 figures; v2 has a spelling correction in the metadat
Device Placement Optimization with Reinforcement Learning
The past few years have witnessed a growth in size and computational
requirements for training and inference with neural networks. Currently, a
common approach to address these requirements is to use a heterogeneous
distributed environment with a mixture of hardware devices such as CPUs and
GPUs. Importantly, the decision of placing parts of the neural models on
devices is often made by human experts based on simple heuristics and
intuitions. In this paper, we propose a method which learns to optimize device
placement for TensorFlow computational graphs. Key to our method is the use of
a sequence-to-sequence model to predict which subsets of operations in a
TensorFlow graph should run on which of the available devices. The execution
time of the predicted placements is then used as the reward signal to optimize
the parameters of the sequence-to-sequence model. Our main result is that on
Inception-V3 for ImageNet classification, and on RNN LSTM, for language
modeling and neural machine translation, our model finds non-trivial device
placements that outperform hand-crafted heuristics and traditional algorithmic
methods.Comment: To appear at ICML 201
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
Deep Learning (DL) has had an immense success in the recent past, leading to
state-of-the-art results in various domains such as image recognition and
natural language processing. One of the reasons for this success is the
increasing size of DL models and the proliferation of vast amounts of training
data being available. To keep on improving the performance of DL, increasing
the scalability of DL systems is necessary. In this survey, we perform a broad
and thorough investigation on challenges, techniques and tools for scalable DL
on distributed infrastructures. This incorporates infrastructures for DL,
methods for parallel DL training, multi-tenant resource scheduling and the
management of training and model data. Further, we analyze and compare 11
current open-source DL frameworks and tools and investigate which of the
techniques are commonly implemented in practice. Finally, we highlight future
research trends in DL systems that deserve further research.Comment: accepted at ACM Computing Surveys, to appea
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
With the breakthroughs in deep learning, the recent years have witnessed a
booming of artificial intelligence (AI) applications and services, spanning
from personal assistant to recommendation systems to video/audio surveillance.
More recently, with the proliferation of mobile computing and
Internet-of-Things (IoT), billions of mobile and IoT devices are connected to
the Internet, generating zillions Bytes of data at the network edge. Driving by
this trend, there is an urgent need to push the AI frontiers to the network
edge so as to fully unleash the potential of the edge big data. To meet this
demand, edge computing, an emerging paradigm that pushes computing tasks and
services from the network core to the network edge, has been widely recognized
as a promising solution. The resulted new inter-discipline, edge AI or edge
intelligence, is beginning to receive a tremendous amount of interest. However,
research on edge intelligence is still in its infancy stage, and a dedicated
venue for exchanging the recent advances of edge intelligence is highly desired
by both the computer system and artificial intelligence communities. To this
end, we conduct a comprehensive survey of the recent research efforts on edge
intelligence. Specifically, we first review the background and motivation for
artificial intelligence running at the network edge. We then provide an
overview of the overarching architectures, frameworks and emerging key
technologies for deep learning model towards training/inference at the network
edge. Finally, we discuss future research opportunities on edge intelligence.
We believe that this survey will elicit escalating attentions, stimulate
fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang,
"Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge
Computing," Proceedings of the IEE
Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
In data-parallel synchronous training of deep neural networks, different
devices (replicas) run the same program with different partitions of the
training batch, but weight update computation is repeated on all replicas,
because the weights do not have a batch dimension to partition. This can be a
bottleneck for performance and scalability in typical language models with
large weights, and models with small per-replica batch size which is typical in
large-scale training. This paper presents an approach to automatically shard
the weight update computation across replicas with efficient communication
primitives and data formatting, using static analysis and transformations on
the training computation graph. We show this technique achieves substantial
speedups on typical image and language models on Cloud TPUs, requiring no
change to model code. This technique helps close the gap between traditionally
expensive (ADAM) and cheap (SGD) optimizers, as they will only take a small
part of training step time and have similar peak memory usage. It helped us to
achieve state-of-the-art training performance in Google's MLPerf 0.6
submission.Comment: 12 pages, 23 figures, 1 tabl
Profile-guided memory optimization for deep neural networks
Recent years have seen deep neural networks (DNNs) becoming wider and deeper
to achieve better performance in many applications of AI. Such DNNs however
require huge amounts of memory to store weights and intermediate results (e.g.,
activations, feature maps, etc.) in propagation. This requirement makes it
difficult to run the DNNs on devices with limited, hard-to-extend memory,
degrades the running time performance, and restricts the design of network
models. We address this challenge by developing a novel profile-guided memory
optimization to efficiently and quickly allocate memory blocks during the
propagation in DNNs. The optimization utilizes a simple and fast heuristic
algorithm based on the two-dimensional rectangle packing problem. Experimenting
with well-known neural network models, we confirm that our method not only
reduces the memory consumption by up to but also accelerates training
and inference by up to a factor of four thanks to the rapidity of the memory
allocation and the ability to use larger mini-batch sizes.Comment: 7 pages, 9 figure
LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations
Reinforcement learning approaches have long appealed to the data management
community due to their ability to learn to control dynamic behavior from raw
system performance. Recent successes in combining deep neural networks with
reinforcement learning have sparked significant new interest in this domain.
However, practical solutions remain elusive due to large training data
requirements, algorithmic instability, and lack of standard tools. In this
work, we introduce LIFT, an end-to-end software stack for applying deep
reinforcement learning to data management tasks. While prior work has
frequently explored applications in simulations, LIFT centers on utilizing
human expertise to learn from demonstrations, thus lowering online training
times. We further introduce TensorForce, a TensorFlow library for applied deep
reinforcement learning exposing a unified declarative interface to common RL
algorithms, thus providing a backend to LIFT. We demonstrate the utility of
LIFT in two case studies in database compound indexing and resource management
in stream processing. Results show LIFT controllers initialized from
demonstrations can outperform human baselines and heuristics across latency
metrics and space usage by up to 70%
- …