371 research outputs found

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Energy and throughput aware fuzzy logic based reconfiguration for MPSoCs

    Get PDF
    Multicore architectures offer an amount of parallelism that is often underutilized, as a result these underutilized resources become a liability instead of advantage. Inefficient resource sharing on the chip can have a negative impact on the performance of an application and may result in greater energy consumption. A large body of research now focuses on reconfigurable multicore architectures in order to support algorithms to find optimal solutions for improved energy and throughput balance. An ideal system would be able to optimize such reconfigurable systems to a level that optimum resources are allocated to a particular workload and all the other underutilized resources remain inactive for greater energy savings. This paper presents a fuzzy logic based reconfiguration engine targeted to optimize a multicore architecture according to the workload requirements for optimum balance between power and performance of the system. The proposed fuzzy logic reconfiguration engine is designed around a 16-core SCMP architecture comprising of reconfigurable cache memories, power gated cores and adaptive on-chip network routers for minimizing leakage energy effects for inactive components. A coarse grained architecture was selected for being able to reconfigure faster, thus making it feasible to be used for runtime adaptation schemes. The presented architecture is analyzed over a set of OpenMP based parallel benchmarks and results show significant energy savings in all cases

    Walter: Wide I/O Scaling of Number of Memory Controllers Versus Frequency and Voltage

    Get PDF
    Computational application demands do push the scaling of the number of cores, which themselves further increase the demand for more bandwidth. The use of larger rank widths and/or scaling the number of memory controllers (MCs) is a straightforward way to increase memory bandwidth. Connecting wide ranks and MCs via low-capacitance Through Silicon Vias (TSVs) favors high-bandwidth 3DStacking systems (e.g. Wide I/O). Given that voltage and frequency scaling (VFS) lower power utilization but the use of lower clock frequencies reduces bandwidth, this article proposes Walter as a W ide I/O technique that trades off sc al ing of the number of memory con t roll e rs (MCs) versus clock fr equency and voltage (VFS) to mitigate low bandwidth and improve energy-per-bit usage. Our findings show that Walter ’s Wide I/O architectural benefits of using a larger number of MCs coupled with wider ranks when combined to VFS are promising: compared to the baseline for a 75% frequency/voltage reduction, MC scalability improved memory bandwidth by 2.4x and energy-per-bit reduced by 20% (most benchmarks for up to 16 MCs). Walter ’s architectural replacement of ranks set at specification frequencies with ones set at lower frequencies allows temperature reduction thus likely allowing further rank stacking

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

    Energy Demand Response for High-Performance Computing Systems

    Get PDF
    The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system\u27s demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems\u27 energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system\u27s emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system\u27s energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response

    Learning Energy-Aware Transaction Scheduling in Database Systems

    Get PDF
    Servers are typically sized to accommodate peak loads, but in practice, they remain under-utilized for much of the time. During periods of low load, there is an opportunity to save power by quickly adjusting processor performance to match the load. Many systems do this by using Dynamic Voltage and Frequency Scaling (DVFS) to adjust the processor’s execution frequency. In transactional database systems, workload-aware approaches running in the DBMS have proved to be able to manage DVFS more effectively than the underlying operating system, as they have more information about the workload and more control over the workload. In this thesis, we ask whether databases can learn to manage DVFS effectively by observing the effects of DVFS on their workload. We present an approach that uses reinforcement learning (RL) to learn in-DBMS frequency governors. Our results show that governors learned using our technique are competitive with state-of-the-art methods, and are able to adapt to a variety of workload conditions. We also show that our method has an added advantage - it allows flexibility in tuning frequency governance to balance a power-performance trade-off. Finally, we discuss the challenges associated with using RL in this setting due to the overheads of using a learned frequency governor

    Architecting Efficient Data Centers.

    Full text link
    Data center power consumption has become a key constraint in continuing to scale Internet services. As our society’s reliance on “the Cloud” continues to grow, companies require an ever-increasing amount of computational capacity to support their customers. Massive warehouse-scale data centers have emerged, requiring 30MW or more of total power capacity. Over the lifetime of a typical high-scale data center, power-related costs make up 50% of the total cost of ownership (TCO). Furthermore, the aggregate effect of data center power consumption across the country cannot be ignored. In total, data center energy usage has reached approximately 2% of aggregate consumption in the United States and continues to grow. This thesis addresses the need to increase computational efficiency to address this grow- ing problem. It proposes a new classes of power management techniques: coordinated full-system idle low-power modes to increase the energy proportionality of modern servers. First, we introduce the PowerNap server architecture, a coordinated full-system idle low- power mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods. While effective for uniprocessor systems, PowerNap relies on full-system idleness and we show that such idleness disappears as the number of cores per processor continues to increase. We expose this problem in a case study of Google Web search in which we demonstrate that coordinated full-system active power modes are necessary to reach energy proportionality and that PowerNap is ineffective because of a lack of idleness. To recover full-system idleness, we introduce DreamWeaver, architectural support for deep sleep. DreamWeaver allows a server to exchange latency for full-system idleness, allowing PowerNap-enabled servers to be effective and provides a better latency- power savings tradeoff than existing approaches. Finally, this thesis investigates workloads which achieve efficiency through methodical cluster provisioning techniques. Using the popular memcached workload, this thesis provides examples of provisioning clusters for cost-efficiency given latency, throughput, and data set size targets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91499/1/meisner_1.pd
    corecore