5,804 research outputs found

    Distributed Training Large-Scale Deep Architectures

    Full text link
    Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training

    Chapter One – An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

    Get PDF
    Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.Peer ReviewedPostprint (published version

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services

    Full text link
    Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. However, existing systems do not support mechanisms and policies for dynamically coordinating load distribution among different Cloud-based data centers in order to determine optimal location for hosting application services to achieve reasonable QoS levels. Further, the Cloud computing providers are unable to predict geographic distribution of users consuming their services, hence the load coordination must happen automatically, and distribution of services must change in response to changes in the load. To counter this problem, we advocate creation of federated Cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions. The overall goal is to create a computing environment that supports dynamic expansion or contraction of capabilities (VMs, services, storage, and database) for handling sudden variations in service demands. This paper presents vision, challenges, and architectural elements of InterCloud for utility-oriented federation of Cloud computing environments. The proposed InterCloud environment supports scaling of applications across multiple vendor clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that federated Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape

    Run-time Variability with Roles

    Get PDF
    Adaptability is an intrinsic property of software systems that require adaptation to cope with dynamically changing environments. Achieving adaptability is challenging. Variability is a key solution as it enables a software system to change its behavior which corresponds to a specific need. The abstraction of variability is to manage variants, which are dynamic parts to be composed to the base system. Run-time variability realizes these variant compositions dynamically at run time to enable adaptation. Adaptation, relying on variants specified at build time, is called anticipated adaptation, which allows the system behavior to change with respect to a set of predefined execution environments. This implies the inability to solve practical problems in which the execution environment is not completely fixed and often unknown until run time. Enabling unanticipated adaptation, which allows variants to be dynamically added at run time, alleviates this inability, but it holds several implications yielding system instability such as inconsistency and run-time failures. Adaptation should be performed only when a system reaches a consistent state to avoid inconsistency. Inconsistency is an effect of adaptation happening when the system changes the state and behavior while a series of methods is still invoking. A software bug is another source of system instability. It often appears in a variant composition and is brought to the system during adaptation. The problem is even more critical for unanticipated adaptation as the system has no prior knowledge of the new variants. This dissertation aims to achieve anticipated and unanticipated adaptation. In achieving adaptation, the issues of inconsistency and software failures, which may happen as a consequence of run-time adaptation, are evidently addressed as well. Roles encapsulate dynamic behavior used to adapt players representing the base system, which is the rationale to select roles as the software system's variants. Based on the role concept, this dissertation presents three mechanisms to comprehensively address adaptation. First, a dynamic instance binding mechanism is proposed to loosely bind players and roles. Dynamic binding of roles enables anticipated and unanticipated adaptation. Second, an object-level tranquility mechanism is proposed to avoid inconsistency by allowing a player object to adapt only when its consistent state is reached. Last, a rollback recovery mechanism is proposed as a proactive mechanism to embrace and handle failures resulting from a defective composition of variants. A checkpoint of a system configuration is created before adaptation. If a specialized bug sensor detects a failure, the system rolls back to the most recent checkpoint. These mechanisms are integrated into a role-based runtime, called LyRT. LyRT was validated with three case studies to demonstrate the practical feasibility. This validation showed that LyRT is more advanced than the existing variability approaches with respect to adaptation due to its consistency control and failure handling. Besides, several benchmarks were set up to quantify the overhead of LyRT concerning the execution time of adaptation. The results revealed that the overhead introduced to achieve anticipated and unanticipated adaptation to be small enough for practical use in adaptive software systems. Thus, LyRT is suitable for adaptive software systems that frequently require the adaptation of large sets of objects
    • …
    corecore