26 research outputs found

    Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing

    Get PDF
    Edge machine learning involves the deployment of learning algorithms at the network edge to leverage massive distributed data and computation resources to train artificial intelligence (AI) models. Among others, the framework of federated edge learning (FEEL) is popular for its data-privacy preservation. FEEL coordinates global model training at an edge server and local model training at devices that are connected by wireless links. This work contributes to the energy-efficient implementation of FEEL in wireless networks by designing joint computation-and-communication resource management ( C2 RM). The design targets the state-of-the-art heterogeneous mobile architecture where parallel computing using both CPU and GPU, called heterogeneous computing , can significantly improve both the performance and energy efficiency. To minimize the sum energy consumption of devices, we propose a novel C2 RM framework featuring multi-dimensional control including bandwidth allocation, CPU-GPU workload partitioning and speed scaling at each device, and C2 time division for each link. The key component of the framework is a set of equilibriums in energy rates with respect to different control variables that are proved to exist among devices or between processing units at each device. The results are applied to designing efficient algorithms for computing the optimal C2 RM policies faster than the standard optimization tools. Based on the equilibriums, we further design energy-efficient schemes for device scheduling and greedy spectrum sharing that scavenges “spectrum holes” resulting from heterogeneous C2 time divisions among devices. Using a real dataset, experiments are conducted to demonstrate the effectiveness of C2 RM on improving the energy efficiency of a FEEL system

    Efficient runtime management for enabling sustainable performance in real-world mobile applications

    Full text link
    Mobile devices have become integral parts of our society. They handle our diverse computing needs from simple daily tasks (i.e., text messaging, e-mail) to complex graphics and media processing under a limited battery budget. Mobile system-on-chip (SoC) designs have become increasingly sophisticated to handle performance needs of diverse workloads and to improve user experience. Unfortunately, power and thermal constraints have also emerged as major concerns. Increased power densities and temperatures substantially impair user experience due to frequent throttling as well as diminishing device reliability and battery life. Addressing these concerns becomes increasingly challenging due to increased complexities at both hardware (e.g., heterogeneous CPUs, accelerators) and software (e.g., vast number of applications, multi-threading). Enabling sustained user experience in face of these challenges requires (1) practical runtime management solutions that can reason about the performance needs of users and applications while optimizing power and temperature; (2) tools for analyzing real-world mobile application behavior and performance. This thesis aims at improving sustained user experience under thermal limitations by incorporating insights from real-world mobile applications into runtime management. This thesis first proposes thermally-efficient and Quality-of-Service (QoS) aware runtime management techniques to enable sustained performance. Our work leverages inherent QoS tolerance of users in real-world applications and introduces QoS-temperature tradeoff as a viable control knob to improve user experience under thermal constraints. We present a runtime control framework, QScale, which manages CPU power and scheduling decisions to optimize temperature while strictly adhering to given QoS targets. We also design a framework, Maestro, which provides autonomous and application-aware management of QoS-temperature tradeoffs. Maestro uses our thermally-efficient QoS control framework, QScale, as its foundation. This thesis also presents tools to facilitate studies of real-world mobile applications. We design a practical record and replay system, RandR, to generate repeatable executions of mobile applications. RandR provides this capability by automatically reproducing non-deterministic input sources in mobile applications such as user inputs and network events. Finally, we focus on the non-deterministic executions in Android malware which seek to evade analysis environments. We propose the Proteus system to identify the instruction-level inputs that reveal analysis environments

    Coordinated management of the processor and memory for optimizing energy efficiency

    Get PDF
    Energy efficiency is a key design goal for future computing systems. With diverse components interacting with each other on the System-on-Chip (SoC), dynamically managing performance, energy and temperature is a challenge in 2D architectures and more so in a 3D stacked environment. Temperature has emerged as the parameter of primary concern. Heuristics based schemes have been employed so far to address these issues. Looking ahead into the future, complex multiphysics interactions between performance, energy and temperature reveal the limitations of such approaches. Therefore in this thesis, first, a comprehensive characterization of existing methods is carried out to identify causes for their inefficiency. Managing different components in an independent and isolated fashion using heuristics is seen to be the primary drawback. Following this, techniques based on feedback control theory to optimize the energy efficiency of the processor and memory in a coordinated fashion are developed. They are evaluated on a real physical system and a cycle-level simulator demonstrating significant improvements over prior schemes. The two main messages of this thesis are, (i) coordination between multiple components is paramount for next generation computing systems and (ii) temperature ought to be treated as a resource like compute or memory cycles.Ph.D

    Investigation into runtime workload classification and management for energy-efficient many-core systems

    Get PDF
    PhD ThesisRecent advances in semiconductor technology have facilitated placing many cores on a single chip. This has led to increases in system architecture complexity with diverse application workloads, with single or multiple applications running concurrently. Determining the most energy-efficient system configuration, i.e. the number of parallel threads, their core allocations and operating frequencies, tailored for each kind of workload and application concurrency scenario is extremely challenging because of the multifaceted relationships between these configuration knobs. Modelling and classifying the workloads can greatly simplify the runtime formulation of these relationships, delivering on energy efficiency, which is the key aim of this thesis. This thesis is focused on the development of new models for classifying single- and multi-application workloads in relation to how these workloads depend on the aforementioned system configurations. Underpinning these models, we implement and practically validate low-cost runtime methodologies for energy-efficient many-core processors. This thesis makes four major contributions. Firstly, a comprehensive study is presented that profiles the power consumption and performance characteristics of a multi-threaded many-core system workload, associating power consumption and performance with multiple concurrent applications. These applications are exercised on a heterogeneous platform generating varying system workloads, viz. CPU-intensive or memory-intensive or a combination of both. Fundamental to this study is an investigation of the tradeoffs between inter-application concurrency with performance and power consumption under different system configurations. The second is a novel model-based runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variations of workload and application scenarios. Using real experimental measurements on a heterogeneous platform with a number of PARSEC benchmark applications, we study power normalized performance (in terms of IPS/Watt) underpinned with analytical power and performance models, derived through multivariate linear regression (MLR). Using these models we show that CPU intensive applications behave differently in IPS/Watt compared to memory intensive applications in both sequential and concurrent application scenarios. Furthermore, this approach demonstrate that it is possible to continuously adapt system configuration through a per-application runtime optimization algorithm, which can improve the IPS/Watt compared to the existing approach. Runtime overheads vii are at least three cycles for each frequency to determine the control action. To reduce overheads and complexity, a novel model-free runtime optimization approach with the aim of maximizing power-normalized performance considering dynamic workload variations has been proposed. This approach is the third contribution. This approach is based on workload classification. This classification is supported by analysis of data collected from a comprehensive study investigating the tradeoffsbetweeninter-applicationconcurrencywithperformanceand power under different system configurations. Extensive experiments have been carried out on heterogeneous and homogeneous platforms with synthetic and standard benchmark applications to develop the control policies and validate our approach. These experiments show that workload classification into CPU-intensive and memory-intensive types provides the foundation for scalable energy minimization with low complexity. Thefourthcontributioncombinesworkloadclassificationwithmodel based multivariate linear regression. The first approach has been used to reduce the problem complexity, and the second approach has been used for optimization in a reduced decision space using linearregression. This approach further improves IPS/Watt significantly compared to existing approaches. This thesis presents a new runtime governor framework which interfaces runtime management algorithms with system monitors and actuators. This tool is not tied down to the specific control algorithms presented in this thesis and therefore has much wider applications.Iraqi Ministry of Higher Education and Scientific Research and Mustansiriyah Universit

    Scientific Advances in STEM: From Professor to Students

    Get PDF
    This book collects the publications of the special Topic Scientific advances in STEM: from Professor to students. The aim is to contribute to the advancement of the Science and Engineering fields and their impact on the industrial sector, which requires a multidisciplinary approach. University generates and transmits knowledge to serve society. Social demands continuously evolve, mainly because of cultural, scientific, and technological development. Researchers must contextualize the subjects they investigate to their application to the local industry and community organizations, frequently using a multidisciplinary point of view, to enhance the progress in a wide variety of fields (aeronautics, automotive, biomedical, electrical and renewable energy, communications, environmental, electronic components, etc.). Most investigations in the fields of science and engineering require the work of multidisciplinary teams, representing a stockpile of research projects in different stages (final year projects, master’s or doctoral studies). In this context, this Topic offers a framework for integrating interdisciplinary research, drawing together experimental and theoretical contributions in a wide variety of fields

    NASA Tech Briefs, September 2012

    Get PDF
    Topics covered include: Beat-to-Beat Blood Pressure Monitor; Measurement Techniques for Clock Jitter; Lightweight, Miniature Inertial Measurement System; Optical Density Analysis of X-Rays Utilizing Calibration Tooling to Estimate Thickness of Parts; Fuel Cell/Electrochemical Cell Voltage Monitor; Anomaly Detection Techniques with Real Test Data from a Spinning Turbine Engine-Like Rotor; Measuring Air Leaks into the Vacuum Space of Large Liquid Hydrogen Tanks; Antenna Calibration and Measurement Equipment; Glass Solder Approach for Robust, Low-Loss, Fiber-to-Waveguide Coupling; Lightweight Metal Matrix Composite Segmented for Manufacturing High-Precision Mirrors; Plasma Treatment to Remove Carbon from Indium UV Filters; Telerobotics Workstation (TRWS) for Deep Space Habitats; Single-Pole Double-Throw MMIC Switches for a Microwave Radiometer; On Shaft Data Acquisition System (OSDAS); ASIC Readout Circuit Architecture for Large Geiger Photodiode Arrays; Flexible Architecture for FPGAs in Embedded Systems; Polyurea-Based Aerogel Monoliths and Composites; Resin-Impregnated Carbon Ablator: A New Ablative Material for Hyperbolic Entry Speeds; Self-Cleaning Particulate Prefilter Media; Modular, Rapid Propellant Loading System/Cryogenic Testbed; Compact, Low-Force, Low-Noise Linear Actuator; Loop Heat Pipe with Thermal Control Valve as a Variable Thermal Link; Process for Measuring Over-Center Distances; Hands-Free Transcranial Color Doppler Probe; Improving Balance Function Using Low Levels of Electrical Stimulation of the Balance Organs; Developing Physiologic Models for Emergency Medical Procedures Under Microgravity; PMA-Linked Fluorescence for Rapid Detection of Viable Bacterial Endospores; Portable Intravenous Fluid Production Device for Ground Use; Adaptation of a Filter Assembly to Assess Microbial Bioburden of Pressurant Within a Propulsion System; Multiplexed Force and Deflection Sensing Shell Membranes for Robotic Manipulators; Whispering Gallery Mode Optomechanical Resonator; Vision-Aided Autonomous Landing and Ingress of Micro Aerial Vehicles; Self-Sealing Wet Chemistry Cell for Field Analysis; General MACOS Interface for Modeling and Analysis for Controlled Optical Systems; Mars Technology Rover with Arm-Mounted Percussive Coring Tool, Microimager, and Sample-Handling Encapsulation Containerization Subsystem; Fault-Tolerant, Real-Time, Multi-Core Computer System; Water Detection Based on Object Reflections; SATPLOT for Analysis of SECCHI Heliospheric Imager Data; Plug-in Plan Tool v3.0.3.1; Frequency Correction for MIRO Chirp Transformation Spectroscopy Spectrum; Nonlinear Estimation Approach to Real-Time Georegistration from Aerial Images; Optimal Force Control of Vibro-Impact Systems for Autonomous Drilling Applications; Low-Cost Telemetry System for Small/Micro Satellites; Operator Interface and Control Software for the Reconfigurable Surface System Tri-ATHLETE; and Algorithms for Determining Physical Responses of Structures Under Load

    2017 Abstracts Student Research Conference

    Get PDF
    corecore