531 research outputs found
Exploiting heterogeneity in Chip-Multiprocessor Design
In the past decade, semiconductor manufacturers are persistent in building faster and smaller transistors in order to boost the processor performance as projected by Moore’s Law. Recently, as we enter the deep submicron regime, continuing the same processor development pace becomes an increasingly difficult issue due to constraints on power, temperature, and the scalability of transistors. To overcome these challenges, researchers propose several innovations at both architecture and device levels that are able to partially solve the problems. These diversities in processor architecture and manufacturing materials provide solutions to continuing Moore’s Law by effectively exploiting the heterogeneity, however, they also introduce a set of unprecedented challenges that have been rarely addressed in prior works. In this dissertation, we present a series of in-depth studies to comprehensively investigate the design and optimization of future multi-core and many-core platforms through exploiting heteroge-neities. First, we explore a large design space of heterogeneous chip multiprocessors by exploiting the architectural- and device-level heterogeneities, aiming to identify the optimal design patterns leading to attractive energy- and cost-efficiencies in the pre-silicon stage. After this high-level study, we pay specific attention to the architectural asymmetry, aiming at developing a heterogeneity-aware task scheduler to optimize the energy-efficiency on a given single-ISA heterogeneous multi-processor. An advanced statistical tool is employed to facilitate the algorithm development. In the third study, we shift our concentration to the device-level heterogeneity and propose to effectively leverage the advantages provided by different materials to solve the increasingly important reliability issue for future processors
A survey of system level power management schemes in the dark-silicon era for many-core architectures
Power consumption in Complementary Metal Oxide Semiconductor (CMOS) technology has escalated to a point that only a fractional part of many-core chips can be powered-on at a time. Fortunately, this fraction can be increased at the expense of performance through the dark-silicon solution. However, with many-core integration set to be heading towards its thousands, power consumption and temperature increases per time, meaning the number of active nodes must be reduced drastically. Therefore, optimized techniques are demanded for continuous advancement in technology. Existing efforts try to overcome this challenge by activating nodes from different parts of the chip at the expense of communication latency. Other efforts on the other hand employ run-time power management techniques to manage the power performance of the cores trading-off performance for power. We found out that, for a significant amount of power to saved and high temperature to be avoided, focus should be on reducing the power consumption of all the on-chip components. Especially, the memory hierarchy and the interconnect. Power consumption can be minimized by, reducing the size of high leakage power dissipating elements, turning-off idle resources and integrating power saving materials
Energy-Efficient and Reliable Computing in Dark Silicon Era
Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability
Adaptive Knobs for Resource Efficient Computing
Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management.
Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems
through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption.
The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy
An Ageing-Aware and Temperature Mapping Algorithm For Multi-Level Cache Nodes
Increase in chip inactivity in the future threatens the performance of many-core systems and therefore, efficient techniques are required for continuous scaling of transistors. As of a result of this challenge, future proposed many-core system designs must consider the possibility of a 50% functioning chip per time as well maintaining performance. Fortunately, this 50% inactivity can be increased by managing the temperature of active nodes and the placement of the dark nodes to leverage a balance working chip whilst considering the lifetime of nodes. However, allocating dark nodes inefficiently can increase the temperature of the chip and increase the waiting time of applications. Consequently, due to stochastic application characteristics, a dynamic rescheduling technique is more desirable compared to fixed design mapping. In this paper, we propose an Ageing Before Temperature Electromigration-Aware, Negative Bias Temperature Instability (NBTI) & Time-dependent Dielectric Breakdown (TDDB) Neighbour Allocation (ABENA 2.0), a dynamic rescheduling management system which considers the ageing and temperature before mapping applications. ABENA also considers the location of active and dark nodes and migrate task based on the characteristics of the nodes. Our proposed algorithm employ Dynamic Voltage Frequency Scaling (DVFS) to reduce the Voltage and Frequency (VF) of the nodes. Results show that, our proposed methods improve the ageing of nodes compared to a conventional round-robin management system by 10% in temperature, and 10% agein
Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy
Management of a problem recently known as “dark silicon” is a new challenge in multicore designs. Prior innovative studies have addressed the dark silicon problem in the fields of power-efficient core design. However, addressing dark silicon challenges in uncore component designs such as cache hierarchy, on-chip interconnect etc. that consume significant portion of the on-chip power consumption is largely unexplored. In this paper, for the first time, we propose an integrated approach which considers the impact of power consumption of core and uncore components simultaneously to improve multi/many-core performance in the dark silicon era. The proposed approach dynamically (1) predicts the changing program behavior on each core; (2) re-determines frequency/voltage, cache capacity and technology in each level of the cache hierarchy based on the program's scalability in order to satisfy the power and temperature constraints. In the proposed architecture, for future chip-multiprocessors (CMPs), we exploit emerging technologies such as non-volatile memories (NVMs) and 3D techniques to combat dark silicon. Also, for the first time, we propose a detailed power model which is useful for future dark silicon CMPs power modeling. Experimental results on SPEC 2000/2006 benchmarks show that the proposed method improves throughput by about 54.3% and energy-delay product by about 61% on average, respectively, in comparison with the conventional CMP architecture with homogenous cache system. (A preliminary short version of this work was presented in the 18th Euromicro Conference on Digital System Design (DSD), 2015.) © 2017 Elsevier B.V
Model-Based Design, Analysis, and Implementations for Power and Energy-Efficient Computing Systems
Modern computing systems are becoming increasingly complex. On one end of
the spectrum, personal computers now commonly support multiple processing
cores, and, on the other end, Internet services routinely employ thousands of
servers in distributed locations to provide the desired service to its users. In
such complex systems, concerns about energy usage and power consumption
are increasingly important. Moreover, growing awareness of environmental
issues has added to the overall complexity by introducing new variables to the
problem. In this regard, the ability to abstractly focus on the relevant details
allows model-based design to help significantly in the analysis and solution of
such problems.
In this dissertation, we explore and analyze model-based design for energy
and power considerations in computing systems. Although the presented techniques
are more generally applicable, we focus their application on large-scale
Internet services operating in U.S. electricity markets. Internet services are becoming
increasingly popular in the ICT ecosystem of today. The physical infrastructure
to support such services is commonly based on a group of cooperative
data centers (DCs) operating in tandem. These DCs are geographically
distributed to provide security and timing guarantees for their customers. To
provide services to millions of customers, DCs employ hundreds of thousands
of servers. These servers consume a large amount of energy that is traditionally
produced by burning coal and employing other environmentally hazardous
methods, such as nuclear and gas power generation plants. This large energy
consumption results in significant and fast-growing financial and environmental
costs. Consequently, for protection of local and global environments, governing
bodies around the globe have begun to introduce legislation to encourage
energy consumers, especially corporate entities, to increase the share of
renewable energy (green energy) in their total energy consumption. However,
in U.S. electricity markets, green energy is usually more expensive than energy
generated from traditional sources like coal or petroleum.
We model the overall problem in three sub-areas and explore different approaches
aimed at reducing the environmental foot print and operating costs
of multi-site Internet services, while honoring the Quality of Service (QoS) constraints
as contracted in service level agreements (SLAs).
Firstly, we model the load distribution among member DCs of a multi-site Internet
service. The use of green energy is optimized considering different factors
such as (a) geographically and temporally variable electricity prices, (b)
the multitude of available energy sources to choose from at each DC, (c) the necessity
to support more than one SLA, and, (d) the requirements to offer more
than one service at each DC. Various approaches are presented for solving this
problem and extensive simulations using Google’s setup in North America are
used to evaluate the presented approaches.
Secondly, we explore the area of shaving the peaks in the energy demand of
large electricity consumers, such as DCs by using a battery-based energy storage
system. Electrical demand of DCs is typically peaky based on the usage
cycle of their customers. Resultant peaks in the electrical demand require development
and maintenance of a costlier energy delivery mechanism, and are
often met using expensive gas or diesel generators which often have a higher
environmental impact. To shave the peak power demand, a battery can be used
which is charged during low load and is discharged during the peak loads.
Since the batteries are costly, we present a scheme to estimate the size of battery
required for any variable electrical load. The electrical load is modeled using
the concept of arrival curves from Network Calculus. Our analysis mechanism
can help determine the appropriate battery size for a given load arrival curve
to reduce the peak.
Thirdly, we present techniques to employ intra-DC scheduling to regulate the
peak power usage of each DC. The model we develop is equally applicable to
an individual server with multi-/many-core chips as well as a complete DC
with an intermix of homogeneous and heterogeneous servers. We evaluate
these approaches on single-core and multi-core chip processors and present the
results.
Overall, our work demonstrates the value of model-based design for intelligent
load distribution across DCs, storage integration, and per DC optimizations
for efficient energy management to reduce operating costs and environmental
footprint for multi-site Internet services
Digital neural circuits : from ions to networks
PhD ThesisThe biological neural computational mechanism is always fascinating to human beings since it shows several state-of-the-art characteristics: strong fault tolerance, high power efficiency and self-learning capability. These behaviours lead the developing trend of designing the next-generation digital computation platform. Thus investigating and understanding how the neurons talk with each other is the key to replicating these calculation features. In this work I emphasize using tailor-designed digital circuits for exactly implementing bio-realistic neural network behaviours, which can be considered a novel approach to cognitive neural computation. The first advance is that biological real-time computing performances allow the presented circuits to be readily adapted for real-time closed-loop in vitro or in vivo experiments, and the second one is a transistor-based circuit that can be directly translated into an impalpable chip for high-level neurologic disorder rehabilitations. In terms of the methodology, first I focus on designing a heterogeneous or multiple-layer-based architecture for reproducing the finest neuron activities both in voltage-and calcium-dependent ion channels. In particular, a digital optoelectronic neuron is developed as a case study. Second, I focus on designing a network-on-chip architecture for implementing a very large-scale neural network (e.g. more than 100,000) with human cognitive functions (e.g. timing control mechanism). Finally, I present a reliable hybrid bio-silicon closed-loop system for central pattern generator prosthetics, which can be considered as a framework for digital neural circuit-based neuro-prosthesis implications. At the end, I present the general digital neural circuit design principles and the long-term social impacts of the presented work
Recommended from our members
Scalable Emulation of Heterogeneous Systems
The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors.
To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution.
To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators.
We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration
- …