100,314 research outputs found
Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study
While cluster computing frameworks are continuously evolving to provide
real-time data analysis capabilities, Apache Spark has managed to be at the
forefront of big data analytics for being a unified framework for both, batch
and stream data processing. However, recent studies on micro-architectural
characterization of in-memory data analytics are limited to only batch
processing workloads. We compare micro-architectural performance of batch
processing and stream processing workloads in Apache Spark using hardware
performance counters on a dual socket server. In our evaluation experiments, we
have found that batch processing are stream processing workloads have similar
micro-architectural characteristics and are bounded by the latency of frequent
data access to DRAM. For data accesses we have found that simultaneous
multi-threading is effective in hiding the data latencies. We have also
observed that (i) data locality on NUMA nodes can improve the performance by
10% on average and(ii) disabling next-line L1-D prefetchers can reduce the
execution time by up-to 14\% and (iii) multiple small executors can provide
up-to 36\% speedup over single large executor
Distributed Deep Q-Learning
We propose a distributed deep learning model to successfully learn control
policies directly from high-dimensional sensory input using reinforcement
learning. The model is based on the deep Q-network, a convolutional neural
network trained with a variant of Q-learning. Its input is raw pixels and its
output is a value function estimating future rewards from taking an action
given a system state. To distribute the deep Q-network training, we adapt the
DistBelief software framework to the context of efficiently training
reinforcement learning agents. As a result, the method is completely
asynchronous and scales well with the number of machines. We demonstrate that
the deep Q-network agent, receiving only the pixels and the game score as
inputs, was able to achieve reasonable success on a simple game with minimal
parameter tuning.Comment: Updated figure of distributed deep learning architecture, updated
content throughout paper including dealing with minor grammatical issues and
highlighting differences of our paper with respect to prior work. arXiv admin
note: text overlap with arXiv:1312.5602 by other author
An Energy-Efficient Resource Management System for a Mobile Ad Hoc Cloud
Recently, mobile ad hoc clouds have emerged as a promising technology for
mobile cyber-physical system applications, such as mobile intelligent video
surveillance and smart homes. Resource management plays a key role in
maximizing resource utilization and application performance in mobile ad hoc
clouds. Unlike resource management in traditional distributed computing
systems, such as clouds, resource management in a mobile ad hoc cloud poses
numerous challenges owing to the node mobility, limited battery power, high
latency, and the dynamic network environment. The real-time requirements
associated with mobile cyber-physical system applications make the problem even
more challenging. Currently, existing resource management systems for mobile ad
hoc clouds are not designed to support mobile cyber-physical system
applications and energy-efficient communication between application tasks. In
this paper, we propose a new energy-efficient resource management system for
mobile ad hoc clouds. The proposed system consists of two layers: a network
layer and a middleware layer. The network layer provides ad hoc network and
communication services to the middleware layer and shares the collected
information in order to allow efficient and robust resource management
decisions. It uses (1) a transmission power control mechanism to improve energy
efficiency and network capacity, (2) link lifetimes to reduce communication and
energy consumption costs, and (3) link quality to estimate data transfer times.
The middleware layer is responsible for the discovery, monitoring, migration,
and allocation of resources. It receives application tasks from users and
allocates tasks to nodes on the basis of network and node-level information.Comment: 19 Page
Recent Advances in Cloud Radio Access Networks: System Architectures, Key Techniques, and Open Issues
As a promising paradigm to reduce both capital and operating expenditures,
the cloud radio access network (C-RAN) has been shown to provide high spectral
efficiency and energy efficiency. Motivated by its significant theoretical
performance gains and potential advantages, C-RANs have been advocated by both
the industry and research community. This paper comprehensively surveys the
recent advances of C-RANs, including system architectures, key techniques, and
open issues. The system architectures with different functional splits and the
corresponding characteristics are comprehensively summarized and discussed. The
state-of-the-art key techniques in C-RANs are classified as: the fronthaul
compression, large-scale collaborative processing, and channel estimation in
the physical layer; and the radio resource allocation and optimization in the
upper layer. Additionally, given the extensiveness of the research area, open
issues and challenges are presented to spur future investigations, in which the
involvement of edge cache, big data mining, social-aware device-to-device,
cognitive radio, software defined network, and physical layer security for
C-RANs are discussed, and the progress of testbed development and trial test
are introduced as well.Comment: 27 pages, 11 figure
Context-Based Concurrent Experience Sharing in Multiagent Systems
One of the key challenges for multi-agent learning is scalability. In this
paper, we introduce a technique for speeding up multi-agent learning by
exploiting concurrent and incremental experience sharing. This solution
adaptively identifies opportunities to transfer experiences between agents and
allows for the rapid acquisition of appropriate policies in large-scale,
stochastic, homogeneous multi-agent systems. We introduce an online,
distributed, supervisor-directed transfer technique for constructing high-level
characterizations of an agent's dynamic learning environment---called
contexts---which are used to identify groups of agents operating under
approximately similar dynamics within a short temporal window. A set of
supervisory agents computes contextual information for groups of subordinate
agents, thereby identifying candidates for experience sharing. Our method uses
a tiered architecture to propagate, with low communication overhead, state,
action, and reward data amongst the members of each dynamically-identified
information-sharing group. We applied this method to a large-scale distributed
task allocation problem with hundreds of information-sharing agents operating
in an unknown, non-stationary environment. We demonstrate that our approach
results in significant performance gains, that it is robust to noise-corrupted
or suboptimal context features, and that communication costs scale linearly
with the supervisor-to-subordinate ratio
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
As the models and the datasets to train deep learning (DL) models scale,
system architects are faced with new challenges, one of which is the memory
capacity bottleneck, where the limited physical memory inside the accelerator
device constrains the algorithm that can be studied. We propose a
memory-centric deep learning system that can transparently expand the memory
capacity available to the accelerators while also providing fast inter-device
communication for parallel training. Our proposal aggregates a pool of memory
modules locally within the device-side interconnect, which are decoupled from
the host interface and function as a vehicle for transparent memory capacity
expansion. Compared to conventional systems, our proposal achieves an average
2.8x speedup on eight DL applications and increases the system-wide memory
capacity to tens of TBs.Comment: Published as a conference paper at the 51st IEEE/ACM International
Symposium on Microarchitecture (MICRO-51), 201
Distributed Learning of Decentralized Control Policies for Articulated Mobile Robots
State-of-the-art distributed algorithms for reinforcement learning rely on
multiple independent agents, which simultaneously learn in parallel
environments while asynchronously updating a common, shared policy. Moreover,
decentralized control architectures (e.g., CPGs) can coordinate spatially
distributed portions of an articulated robot to achieve system-level
objectives. In this work, we investigate the relationship between distributed
learning and decentralized control by learning decentralized control policies
for the locomotion of articulated robots in challenging environments. To this
end, we present an approach that leverages the structure of the asynchronous
advantage actor-critic (A3C) algorithm to provide a natural means of learning
decentralized control policies on a single articulated robot. Our primary
contribution shows individual agents in the A3C algorithm can be defined by
independently controlled portions of the robot's body, thus enabling
distributed learning on a single robot for efficient hardware implementation.
We present results of closed-loop locomotion in unstructured terrains on a
snake and a hexapod robot, using decentralized controllers learned offline and
online respectively.
Preprint of the paper submitted to the IEEE Transactions in Robotics (T-RO)
journal in October 2018, and accepted for publication as a regular paper in May
2019.Comment: \c{opyright} 20XX IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Adaptive Event Dispatching in Serverless Computing Infrastructures
Serverless computing is an emerging Cloud service model. It is currently
gaining momentum as the next step in the evolution of hosted computing from
capacitated machine virtualisation and microservices towards utility computing.
The term "serverless" has become a synonym for the entirely
resource-transparent deployment model of cloud-based event-driven distributed
applications. This work investigates how adaptive event dispatching can improve
serverless platform resource efficiency and contributes a novel approach that
allows for better scaling and fitting of the platform's resource consumption to
actual demand
Management and Orchestration of Network Slices in 5G, Fog, Edge and Clouds
Network slicing allows network operators to build multiple isolated virtual
networks on a shared physical network to accommodate a wide variety of services
and applications. With network slicing, service providers can provide a
cost-efficient solution towards meeting diverse performance requirements of
deployed applications and services. Despite slicing benefits, End-to-End
orchestration and management of network slices is a challenging and complicated
task. In this chapter, we intend to survey all the relevant aspects of network
slicing, with the focus on networking technologies such as Software-defined
networking (SDN) and Network Function Virtualization (NFV) in 5G, Fog/Edge and
Cloud Computing platforms. To build the required background, this chapter
begins with a brief overview of 5G, Fog/Edge and Cloud computing, and their
interplay. Then we cover the 5G vision for network slicing and extend it to the
Fog and Cloud computing through surveying the state-of-the-art slicing
approaches in these platforms. We conclude the chapter by discussing future
directions, analyzing gaps and trends towards the network slicing realization.Comment: 31 pages, 4 figures, Fog and Edge Computing: Principles and
Paradigms, Wiley Press, New York, USA, 201
Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics
In a new effort to make our research transparent and reproducible by others,
we developed a workflow to run and share computational studies on the public
cloud Microsoft Azure. It uses Docker containers to create an image of the
application software stack. We also adopt several tools that facilitate
creating and managing virtual machines on compute nodes and submitting jobs to
these nodes. The configuration files for these tools are part of an expanded
"reproducibility package" that includes workflow definitions for cloud
computing, in addition to input files and instructions. This facilitates
re-creating the cloud environment to re-run the computations under the same
conditions. Although cloud providers have improved their offerings, many
researchers using high-performance computing (HPC) are still skeptical about
cloud computing. Thus, we ran benchmarks for tightly coupled applications to
confirm that the latest HPC nodes of Microsoft Azure are indeed a viable
alternative to traditional on-site HPC clusters. We also show that cloud
offerings are now adequate to complete computational fluid dynamics studies
with in-house research software that uses parallel computing with GPUs.
Finally, we share with the community what we have learned from nearly two years
of using Azure cloud to enhance transparency and reproducibility in our
computational simulations.Comment: 11 pages, 8 figures, 5 table
- …