132 research outputs found

    Efficient mobile computing

    Get PDF
    Smart handheld devices such as phones, tablets and watches are becoming more and more common rapidly. From a computer architect point of view, processor design for such computer systems is a complex problem. Since the Li-ion battery manufacturing technology is strictly limited by physical and technological limitations, the new generation of mobile processors should have a better energy efficiency to support an acceptable battery life. On the other hand, TLP of current mobile applications is measured to be mostly less than 2, which implies mobile processors should have high performance on user's demand to provide an acceptable QoE. By shrinking the chip manufacturing technology size, SoC design has been the most preferred integration approach in such applications. For example, Apple's A10, iPhone7 SoC, has more than 3 billion transistors including 4 big.LITTLE cores, 6 GPU cores and caches. Due to power-density and heat-dissipation constraints of such integration level, providing high performance on demand in an efficient way is a complex control problem. In the A10 architecture, assigning the right cores at the right time to running threads is a challenging complex control problem. The state of the art systems have control loops for controlling architectural parameters in different ways. In mobile devices controllers are heuristics-based mainly for simplicity. Considering the power-density and heat-dissipation issues in such systems, we propose an OS architecture and interface to provide an environment for improving the functionality of controllers in mobile computer systems

    Planificación consciente de la contención y gestión de recursos en arquitecturas multicore emergentes

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 14-12-2021Chip multicore processors (CMPs) currently constitute the architecture of choice for mosto general-pùrpose computing systems, and they will likely continue to be dominant in the near future. Advances in technology have enabled to pack an increasing number of cores and bigger caches on the same chip. Nevertheless, contention on shared resources on CMPs -present since the advent of these architectures- still poses a big challenge. Cores in a CMP typically share a last-level cache (LLC) and other memory-related resources with the remaining cores, such as a DRAM controller and an interconnection network. This causes that co-running applications may intensively compete with each other for these shared resources, leading to substantial and uneven performance degradation...Los procesadores multinúcleo o CMPs (Chip Multicore Processors) son actualmente la arquitectura más usada por la mayoría de sistemas de computación de propósito general, y muy probablemente se mantendrían en esa posición dominante en el futuro cercano. Los avances tecnológicos han permitido integrar progresivamente en el mismo chip más cores y aumentar los tamaños de los distintos niveles de cache. No obstante, la contención de recursos compartidos en CMPs {presente desde la aparición de estas arquitecturas{ todavía representa un reto importante que afrontar. Los cores en un CMP comparten en la mayor parte de los diseños una cache de último nivel o LLC (Last-Level Cache) y otros recursos, como el controlador de DRAM o una red de interconexión. La existencia de dichos recursos compartidos provoca en ocasiones que cuando se ejecutan dos o más aplicaciones simultáneamente en el sistema, se produzca una degradación sustancial y potencialmente desigual del rendimiento entre aplicaciones...Fac. de InformáticaTRUEunpu

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques

    RUNTIME METHODS TO IMPROVE ENERGY EFFICIENCY IN SUPERCOMPUTING APPLICATIONS

    Get PDF
    Energy efficiency in supercomputing is critical to limit operating costs and carbon footprints. While the energy efficiency of future supercomputing centers needs to improve at all levels, the energy consumed by the processing units is a large fraction of the total energy consumed by High Performance Computing (HPC) systems. HPC applications use a parallel programming paradigm like the Message Passing Interface (MPI) to coordinate computation and communication among thousands of processors. With dynamically-changing factors both in hardware and software affecting energy usage of processors, there exists a need for power monitoring and regulation at runtime to achieve savings in energy. This dissertation highlights an adaptive runtime framework that enables processors with core-specific power control by dynamically adapting to workload characteristics to reduce power with little or no performance impact. Two opportunities to improve the energy efficiency of processors running MPI applications are identified - computational workload imbalance and waiting on memory. Monitoring of performance and power regulation is performed by the framework transparently within the MPI runtime system, eliminating the need for code changes to MPI applications. The effect of enforcing power limits (capping) on processors is also investigated. Experiments on 32 nodes (1024 cores) show that in presence of workload imbalance, the runtime reduces Central Processing Unit (CPU) frequency on cores not on the critical path, thereby reducing power and hence energy usage without deteriorating performance. Using this runtime, six MPI mini-applications and a full MPI application show an overall 20% decrease in energy use with less than 1% increase in execution time. In addition, the lowering of frequency on non-critical cores reduces run-to-run performance variation and improves performance. For the full application, an average speedup of 11% is seen, while the power is lowered by about 31% for an energy savings of up to 42%. Another experiment on 16 nodes (256 cores) that are power capped also shows performance improvement along with power reduction. Thus, energy optimization can also be a performance optimization. For applications that are limited by memory access times, memory metrics identified facilitate lowering of power by up to 32% without adversely impacting performance.Doctor of Philosoph

    Polymorphic computing abstraction for heterogeneous architectures

    Get PDF
    Integration of multiple computing paradigms onto system on chip (SoC) has pushed the boundaries of design space exploration for hardware architectures and computing system software stack. The heterogeneity of computing styles in SoC has created a new class of architectures referred to as Heterogeneous Architectures. Novel applications developed to exploit the different computing styles are user centric for embedded SoC. Software and hardware designers are faced with several challenges to harness the full potential of heterogeneous architectures. Applications have to execute on more than one compute style to increase overall SoC resource utilization. The implication of such an abstraction is that application threads need to be polymorphic. Operating system layer is thus faced with the problem of scheduling polymorphic threads. Resource allocation is also an important problem to be dealt by the OS. Morphism evolution of application threads is constrained by the availability of heterogeneous computing resources. Traditional design optimization goals such as computational power and lower energy per computation are inadequate to satisfy user centric application resource needs. Resource allocation decisions at application layer need to permeate to the architectural layer to avoid conflicting demands which may affect energy-delay characteristics of application threads. We propose Polymorphic computing abstraction as a unified computing model for heterogeneous architectures to address the above issues. Simulation environment for polymorphic applications is developed and evaluated under various scheduling strategies to determine the effectiveness of polymorphism abstraction on resource allocation. User satisfaction model is also developed to complement polymorphism and used for optimization of resource utilization at application and network layer of embedded systems

    Thermal and QoS-Aware Embedded Systems

    Full text link
    While embedded systems such as smartphones and smart cars become essential parts of our lives, they face urgent thermal challenges. Extreme thermal conditions (i.e., both high and low temperatures) degrade system reliability, even risking safety; devices in the cold environments unexpectedly go offline, whereas extremely high device temperatures can cause device failures or battery explosions. These thermal limits become close to the norm because of ever-increasing chip power densities and application complexities. Embedded systems in the wild, however, lack adaptive and effective solutions to overcome such thermal challenges. An adaptive thermal management solution must cope with various runtime thermal scenarios under a changing ambient temperature. An effective solution requires the understanding of the dynamic thermal behaviors of underlying hardware and application workloads to ensure thermal and application quality-of-service (QoS) requirements. This thesis proposes a suite of adaptive and effective thermal management solutions to address different aspects of real-world thermal challenges faced by modern embedded systems. First, we present BPM, a battery-aware power management framework for mobile devices to address the unexpected device shutoffs in cold environments. We develop BPM as a background service that characterizes and controls real-time battery behaviors to maintain operable conditions even in cold environments. We then propose eTEC, building on the thermoelectric cooling solution, which adaptively controls cooling and computational power to avoid mobile devices overheating. For the real-time embedded systems such as cars, we present RT-TRM, a thermal-aware resource management framework that monitors changing ambient temperatures and allocates system resources to individual tasks. Next, we target in-vehicle vision systems running on CPUs–GPU system-on-chips and develop CPU–GPU co-scheduling to tackle thermal imbalance across CPUs caused by GPU heat. We evaluate all of these solutions using representative mobile/automotive platforms and workloads, demonstrating their effectiveness in meeting thermal and QoS requirements.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153350/1/ymoonlee_1.pd

    A differentiated proposal of three dimension i/o performance characterization model focusing on storage environments

    Get PDF
    The I/O bottleneck remains a central issue in high-performance environments. Cloud computing, high-performance computing (HPC) and big data environments share many underneath difficulties to deliver data at a desirable time rate requested by high-performance applications. This increases the possibility of creating bottlenecks throughout the application feeding process by bottom hardware devices located in the storage system layer. In the last years, many researchers have been proposed solutions to improve the I/O architecture considering different approaches. Some of them take advantage of hardware devices while others focus on a sophisticated software approach. However, due to the complexity of dealing with high-performance environments, creating solutions to improve I/O performance in both software and hardware is challenging and gives researchers many opportunities. Classifying these improvements in different dimensions allows researchers to understand how these improvements have been built over the years and how it progresses. In addition, it also allows future efforts to be directed to research topics that have developed at a lower rate, balancing the general development process. This research present a three-dimension characterization model for classifying research works on I/O performance improvements for large scale storage computing facilities. This classification model can also be used as a guideline framework to summarize researches providing an overview of the actual scenario. We also used the proposed model to perform a systematic literature mapping that covered ten years of research on I/O performance improvements in storage environments. This study classified hundreds of distinct researches identifying which were the hardware, software, and storage systems that received more attention over the years, which were the most researches proposals elements and where these elements were evaluated. In order to justify the importance of this model and the development of solutions that targets I/O performance improvements, we evaluated a subset of these improvements using a a real and complete experimentation environment, the Grid5000. Analysis over different scenarios using a synthetic I/O benchmark demonstrates how the throughput and latency parameters behaves when performing different I/O operations using distinct storage technologies and approaches.O gargalo de E/S continua sendo um problema central em ambientes de alto desempenho. Os ambientes de computação em nuvem, computação de alto desempenho (HPC) e big data compartilham muitas dificuldades para fornecer dados em uma taxa de tempo desejável solicitada por aplicações de alto desempenho. Isso aumenta a possibilidade de criar gargalos em todo o processo de alimentação de aplicativos pelos dispositivos de hardware inferiores localizados na camada do sistema de armazenamento. Nos últimos anos, muitos pesquisadores propuseram soluções para melhorar a arquitetura de E/S considerando diferentes abordagens. Alguns deles aproveitam os dispositivos de hardware, enquanto outros se concentram em uma abordagem sofisticada de software. No entanto, devido à complexidade de lidar com ambientes de alto desempenho, criar soluções para melhorar o desempenho de E/S em software e hardware é um desafio e oferece aos pesquisadores muitas oportunidades. A classificação dessas melhorias em diferentes dimensões permite que os pesquisadores entendam como essas melhorias foram construídas ao longo dos anos e como elas progridem. Além disso, também permite que futuros esforços sejam direcionados para tópicos de pesquisa que se desenvolveram em menor proporção, equilibrando o processo geral de desenvolvimento. Esta pesquisa apresenta um modelo de caracterização tridimensional para classificar trabalhos de pesquisa sobre melhorias de desempenho de E/S para instalações de computação de armazenamento em larga escala. Esse modelo de classificação também pode ser usado como uma estrutura de diretrizes para resumir as pesquisas, fornecendo uma visão geral do cenário real. Também usamos o modelo proposto para realizar um mapeamento sistemático da literatura que abrangeu dez anos de pesquisa sobre melhorias no desempenho de E/S em ambientes de armazenamento. Este estudo classificou centenas de pesquisas distintas, identificando quais eram os dispositivos de hardware, software e sistemas de armazenamento que receberam mais atenção ao longo dos anos, quais foram os elementos de proposta mais pesquisados e onde esses elementos foram avaliados. Para justificar a importância desse modelo e o desenvolvimento de soluções que visam melhorias no desempenho de E/S, avaliamos um subconjunto dessas melhorias usando um ambiente de experimentação real e completo, o Grid5000. Análises em cenários diferentes usando um benchmark de E/S sintética demonstra como os parâmetros de vazão e latência se comportam ao executar diferentes operações de E/S usando tecnologias e abordagens distintas de armazenamento

    Investigation into runtime workload classification and management for energy-efficient many-core systems

    Get PDF
    PhD ThesisRecent advances in semiconductor technology have facilitated placing many cores on a single chip. This has led to increases in system architecture complexity with diverse application workloads, with single or multiple applications running concurrently. Determining the most energy-efficient system configuration, i.e. the number of parallel threads, their core allocations and operating frequencies, tailored for each kind of workload and application concurrency scenario is extremely challenging because of the multifaceted relationships between these configuration knobs. Modelling and classifying the workloads can greatly simplify the runtime formulation of these relationships, delivering on energy efficiency, which is the key aim of this thesis. This thesis is focused on the development of new models for classifying single- and multi-application workloads in relation to how these workloads depend on the aforementioned system configurations. Underpinning these models, we implement and practically validate low-cost runtime methodologies for energy-efficient many-core processors. This thesis makes four major contributions. Firstly, a comprehensive study is presented that profiles the power consumption and performance characteristics of a multi-threaded many-core system workload, associating power consumption and performance with multiple concurrent applications. These applications are exercised on a heterogeneous platform generating varying system workloads, viz. CPU-intensive or memory-intensive or a combination of both. Fundamental to this study is an investigation of the tradeoffs between inter-application concurrency with performance and power consumption under different system configurations. The second is a novel model-based runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variations of workload and application scenarios. Using real experimental measurements on a heterogeneous platform with a number of PARSEC benchmark applications, we study power normalized performance (in terms of IPS/Watt) underpinned with analytical power and performance models, derived through multivariate linear regression (MLR). Using these models we show that CPU intensive applications behave differently in IPS/Watt compared to memory intensive applications in both sequential and concurrent application scenarios. Furthermore, this approach demonstrate that it is possible to continuously adapt system configuration through a per-application runtime optimization algorithm, which can improve the IPS/Watt compared to the existing approach. Runtime overheads vii are at least three cycles for each frequency to determine the control action. To reduce overheads and complexity, a novel model-free runtime optimization approach with the aim of maximizing power-normalized performance considering dynamic workload variations has been proposed. This approach is the third contribution. This approach is based on workload classification. This classification is supported by analysis of data collected from a comprehensive study investigating the tradeoffsbetweeninter-applicationconcurrencywithperformanceand power under different system configurations. Extensive experiments have been carried out on heterogeneous and homogeneous platforms with synthetic and standard benchmark applications to develop the control policies and validate our approach. These experiments show that workload classification into CPU-intensive and memory-intensive types provides the foundation for scalable energy minimization with low complexity. Thefourthcontributioncombinesworkloadclassificationwithmodel based multivariate linear regression. The first approach has been used to reduce the problem complexity, and the second approach has been used for optimization in a reduced decision space using linearregression. This approach further improves IPS/Watt significantly compared to existing approaches. This thesis presents a new runtime governor framework which interfaces runtime management algorithms with system monitors and actuators. This tool is not tied down to the specific control algorithms presented in this thesis and therefore has much wider applications.Iraqi Ministry of Higher Education and Scientific Research and Mustansiriyah Universit

    Power Consumption Analysis, Measurement, Management, and Issues:A State-of-the-Art Review of Smartphone Battery and Energy Usage

    Get PDF
    The advancement and popularity of smartphones have made it an essential and all-purpose device. But lack of advancement in battery technology has held back its optimum potential. Therefore, considering its scarcity, optimal use and efficient management of energy are crucial in a smartphone. For that, a fair understanding of a smartphone's energy consumption factors is necessary for both users and device manufacturers, along with other stakeholders in the smartphone ecosystem. It is important to assess how much of the device's energy is consumed by which components and under what circumstances. This paper provides a generalized, but detailed analysis of the power consumption causes (internal and external) of a smartphone and also offers suggestive measures to minimize the consumption for each factor. The main contribution of this paper is four comprehensive literature reviews on: 1) smartphone's power consumption assessment and estimation (including power consumption analysis and modelling); 2) power consumption management for smartphones (including energy-saving methods and techniques); 3) state-of-the-art of the research and commercial developments of smartphone batteries (including alternative power sources); and 4) mitigating the hazardous issues of smartphones' batteries (with a details explanation of the issues). The research works are further subcategorized based on different research and solution approaches. A good number of recent empirical research works are considered for this comprehensive review, and each of them is succinctly analysed and discussed
    corecore