100,314 research outputs found

    Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study

    Full text link
    While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We compare micro-architectural performance of batch processing and stream processing workloads in Apache Spark using hardware performance counters on a dual socket server. In our evaluation experiments, we have found that batch processing are stream processing workloads have similar micro-architectural characteristics and are bounded by the latency of frequent data access to DRAM. For data accesses we have found that simultaneous multi-threading is effective in hiding the data latencies. We have also observed that (i) data locality on NUMA nodes can improve the performance by 10% on average and(ii) disabling next-line L1-D prefetchers can reduce the execution time by up-to 14\% and (iii) multiple small executors can provide up-to 36\% speedup over single large executor

    Distributed Deep Q-Learning

    Full text link
    We propose a distributed deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is based on the deep Q-network, a convolutional neural network trained with a variant of Q-learning. Its input is raw pixels and its output is a value function estimating future rewards from taking an action given a system state. To distribute the deep Q-network training, we adapt the DistBelief software framework to the context of efficiently training reinforcement learning agents. As a result, the method is completely asynchronous and scales well with the number of machines. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to achieve reasonable success on a simple game with minimal parameter tuning.Comment: Updated figure of distributed deep learning architecture, updated content throughout paper including dealing with minor grammatical issues and highlighting differences of our paper with respect to prior work. arXiv admin note: text overlap with arXiv:1312.5602 by other author

    An Energy-Efficient Resource Management System for a Mobile Ad Hoc Cloud

    Full text link
    Recently, mobile ad hoc clouds have emerged as a promising technology for mobile cyber-physical system applications, such as mobile intelligent video surveillance and smart homes. Resource management plays a key role in maximizing resource utilization and application performance in mobile ad hoc clouds. Unlike resource management in traditional distributed computing systems, such as clouds, resource management in a mobile ad hoc cloud poses numerous challenges owing to the node mobility, limited battery power, high latency, and the dynamic network environment. The real-time requirements associated with mobile cyber-physical system applications make the problem even more challenging. Currently, existing resource management systems for mobile ad hoc clouds are not designed to support mobile cyber-physical system applications and energy-efficient communication between application tasks. In this paper, we propose a new energy-efficient resource management system for mobile ad hoc clouds. The proposed system consists of two layers: a network layer and a middleware layer. The network layer provides ad hoc network and communication services to the middleware layer and shares the collected information in order to allow efficient and robust resource management decisions. It uses (1) a transmission power control mechanism to improve energy efficiency and network capacity, (2) link lifetimes to reduce communication and energy consumption costs, and (3) link quality to estimate data transfer times. The middleware layer is responsible for the discovery, monitoring, migration, and allocation of resources. It receives application tasks from users and allocates tasks to nodes on the basis of network and node-level information.Comment: 19 Page

    Recent Advances in Cloud Radio Access Networks: System Architectures, Key Techniques, and Open Issues

    Full text link
    As a promising paradigm to reduce both capital and operating expenditures, the cloud radio access network (C-RAN) has been shown to provide high spectral efficiency and energy efficiency. Motivated by its significant theoretical performance gains and potential advantages, C-RANs have been advocated by both the industry and research community. This paper comprehensively surveys the recent advances of C-RANs, including system architectures, key techniques, and open issues. The system architectures with different functional splits and the corresponding characteristics are comprehensively summarized and discussed. The state-of-the-art key techniques in C-RANs are classified as: the fronthaul compression, large-scale collaborative processing, and channel estimation in the physical layer; and the radio resource allocation and optimization in the upper layer. Additionally, given the extensiveness of the research area, open issues and challenges are presented to spur future investigations, in which the involvement of edge cache, big data mining, social-aware device-to-device, cognitive radio, software defined network, and physical layer security for C-RANs are discussed, and the progress of testbed development and trial test are introduced as well.Comment: 27 pages, 11 figure

    Context-Based Concurrent Experience Sharing in Multiagent Systems

    Full text link
    One of the key challenges for multi-agent learning is scalability. In this paper, we introduce a technique for speeding up multi-agent learning by exploiting concurrent and incremental experience sharing. This solution adaptively identifies opportunities to transfer experiences between agents and allows for the rapid acquisition of appropriate policies in large-scale, stochastic, homogeneous multi-agent systems. We introduce an online, distributed, supervisor-directed transfer technique for constructing high-level characterizations of an agent's dynamic learning environment---called contexts---which are used to identify groups of agents operating under approximately similar dynamics within a short temporal window. A set of supervisory agents computes contextual information for groups of subordinate agents, thereby identifying candidates for experience sharing. Our method uses a tiered architecture to propagate, with low communication overhead, state, action, and reward data amongst the members of each dynamically-identified information-sharing group. We applied this method to a large-scale distributed task allocation problem with hundreds of information-sharing agents operating in an unknown, non-stationary environment. We demonstrate that our approach results in significant performance gains, that it is robust to noise-corrupted or suboptimal context features, and that communication costs scale linearly with the supervisor-to-subordinate ratio

    Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

    Full text link
    As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device constrains the algorithm that can be studied. We propose a memory-centric deep learning system that can transparently expand the memory capacity available to the accelerators while also providing fast inter-device communication for parallel training. Our proposal aggregates a pool of memory modules locally within the device-side interconnect, which are decoupled from the host interface and function as a vehicle for transparent memory capacity expansion. Compared to conventional systems, our proposal achieves an average 2.8x speedup on eight DL applications and increases the system-wide memory capacity to tens of TBs.Comment: Published as a conference paper at the 51st IEEE/ACM International Symposium on Microarchitecture (MICRO-51), 201

    Distributed Learning of Decentralized Control Policies for Articulated Mobile Robots

    Full text link
    State-of-the-art distributed algorithms for reinforcement learning rely on multiple independent agents, which simultaneously learn in parallel environments while asynchronously updating a common, shared policy. Moreover, decentralized control architectures (e.g., CPGs) can coordinate spatially distributed portions of an articulated robot to achieve system-level objectives. In this work, we investigate the relationship between distributed learning and decentralized control by learning decentralized control policies for the locomotion of articulated robots in challenging environments. To this end, we present an approach that leverages the structure of the asynchronous advantage actor-critic (A3C) algorithm to provide a natural means of learning decentralized control policies on a single articulated robot. Our primary contribution shows individual agents in the A3C algorithm can be defined by independently controlled portions of the robot's body, thus enabling distributed learning on a single robot for efficient hardware implementation. We present results of closed-loop locomotion in unstructured terrains on a snake and a hexapod robot, using decentralized controllers learned offline and online respectively. Preprint of the paper submitted to the IEEE Transactions in Robotics (T-RO) journal in October 2018, and accepted for publication as a regular paper in May 2019.Comment: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Adaptive Event Dispatching in Serverless Computing Infrastructures

    Full text link
    Serverless computing is an emerging Cloud service model. It is currently gaining momentum as the next step in the evolution of hosted computing from capacitated machine virtualisation and microservices towards utility computing. The term "serverless" has become a synonym for the entirely resource-transparent deployment model of cloud-based event-driven distributed applications. This work investigates how adaptive event dispatching can improve serverless platform resource efficiency and contributes a novel approach that allows for better scaling and fitting of the platform's resource consumption to actual demand

    Management and Orchestration of Network Slices in 5G, Fog, Edge and Clouds

    Full text link
    Network slicing allows network operators to build multiple isolated virtual networks on a shared physical network to accommodate a wide variety of services and applications. With network slicing, service providers can provide a cost-efficient solution towards meeting diverse performance requirements of deployed applications and services. Despite slicing benefits, End-to-End orchestration and management of network slices is a challenging and complicated task. In this chapter, we intend to survey all the relevant aspects of network slicing, with the focus on networking technologies such as Software-defined networking (SDN) and Network Function Virtualization (NFV) in 5G, Fog/Edge and Cloud Computing platforms. To build the required background, this chapter begins with a brief overview of 5G, Fog/Edge and Cloud computing, and their interplay. Then we cover the 5G vision for network slicing and extend it to the Fog and Cloud computing through surveying the state-of-the-art slicing approaches in these platforms. We conclude the chapter by discussing future directions, analyzing gaps and trends towards the network slicing realization.Comment: 31 pages, 4 figures, Fog and Edge Computing: Principles and Paradigms, Wiley Press, New York, USA, 201

    Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics

    Full text link
    In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application software stack. We also adopt several tools that facilitate creating and managing virtual machines on compute nodes and submitting jobs to these nodes. The configuration files for these tools are part of an expanded "reproducibility package" that includes workflow definitions for cloud computing, in addition to input files and instructions. This facilitates re-creating the cloud environment to re-run the computations under the same conditions. Although cloud providers have improved their offerings, many researchers using high-performance computing (HPC) are still skeptical about cloud computing. Thus, we ran benchmarks for tightly coupled applications to confirm that the latest HPC nodes of Microsoft Azure are indeed a viable alternative to traditional on-site HPC clusters. We also show that cloud offerings are now adequate to complete computational fluid dynamics studies with in-house research software that uses parallel computing with GPUs. Finally, we share with the community what we have learned from nearly two years of using Azure cloud to enhance transparency and reproducibility in our computational simulations.Comment: 11 pages, 8 figures, 5 table
    • …
    corecore