Search CORE

14,725 research outputs found

Aggressive and reliable high-performance architectures - techniques for thermal control, energy efficiency, and performance augmentation

Author: Ramesh Prem Kumar
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

As more and more transistors fit in a single chip, consumers of the electronics industry continue to expect decline in cost-per-function. Advancements in process technology offer steady improvements in system performance. The improvements manifest themselves as shrinking area, faster circuits and improved battery life. However, this migration toward sub-micro/nano-meter technologies presents a new set of challenges as the system becomes extremely sensitive to any voltage, temperature or process variations. One approach to immunize the system from the adverse effects of these variations is to add sufficient safety margins to the operating clock frequency of the system. Clearly, this approach is overly conservative because these worst case scenarios rarely occur. But, process technology in nanoscale era has already hit the power and frequency walls. Regardless of any of these challenges, the present processors not only need to run faster, but also cooler and use lesser energy. At a juncture where there is no further improvement in clock frequency is possible, data dependent latching through Timing Speculation (TS) provides a silver lining. Timing speculation is a widely known method for realizing better-than-worst-case systems. TS is aggressive in nature, where the mechanism is to dynamically tune the system frequency beyond the worst-case limits obtained from application characteristics to enhance the performance of system-on-chips (SoCs). However, such aggressive tuning has adverse consequences that need to be overcome. Power dissipation, on-chip temperature and reliability are key issues that cannot be ignored. A carefully designed power management technique combined with a reliable, controlled, aggressive clocking not only attempts to constrain power dissipation within a limit, but also improves performance whenever possible. In this dissertation, we present a novel power level switching mechanism by redefining the existing voltage-frequency pairs. We introduce an aggressive yet reliable framework for energy efficient thermal control. We were able to achieve up to 40% speed-up compared to a base scheme without overclocking. We compare our method against different schemes. We observe that up to 75% Energy-Delay squared product (ED2) savings relative to base architecture is possible. We showcase the loss of efficiency in present chip multiprocessor systems due to excess power supplied, and propose Utilization-aware Task Scheduling (UTS) - a power management scheme that increases energy efficiency of chip multiprocessors. Our experiments demonstrate that UTS along with aggressive timing speculation squeezes out maximum performance from the system without loss of efficiency, and breaching power & thermal constraints. From our evaluation we infer that UTS improves performance by up to 12% due to aggressive power level switching and over 50% in ED2 savings compared to traditional power management techniques. Aggressive clocking systems having TS as their central theme operate at a clock frequency range beyond specified safe limits, exploiting the data dependence on circuit critical paths. However, the margin for performance enhancement is restricted due to extreme difference between short paths and critical paths. In this thesis, we show that increasing the lengths of short paths of the circuit increases the margin of TS, leading to performance improvement in aggressively designed systems. We develop Min-arc algorithm to efficiently add delay buffers to selected short paths while keeping down the area penalty. We show that by using our algorithm, it is possible to increase the circuit contamination delay by up to 30% without affecting the propagation delay, with moderate area overhead. We also explore the possibility of increasing short path delays further by relaxing the constraint on propagation delay, and achieve even higher performance. Overall, we bring out the inter-relationship between power, temperature and reliability of aggressively clocked systems. Our main objective is to achieve maximal performance benefits and improved energy efficiency within thermal constraints by effectively combining dynamic frequency scaling, dynamic voltage scaling and reliable overclocking. We provide solutions to improve the existing power management in chip multiprocessors to dynamically maximize system utilization and satisfy the power constraints within safe thermal limits

Digital Repository @ Iowa State University (ISU)

From blind certainty to informed uncertainty

Author: Kurt Keutzer
Michael Orshansky
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

RAKSHA:Reliable and Aggressive frameworK for System design using High-integrity Approaches

Author: Avirneni Naga Durga Prasad
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2012
Field of study

Advances in the fabrication technology have been a major driving force in the unprecedented increase in computing capabilities over the last several decades. Despite huge reductions in the switching energy of the transistors, two major issues have emerged with decreasing fabrication technology scales. They are: 1) increased impact of process, voltage, and temperature (PVT) variation on transistor performance, and 2) increased susceptibility of transistors to soft errors induced by high energy particles. In presence of PVT variation, as transistor sizes continue to decrease, the design margins used to guarantee correct operation in the presence of worst-case scenarios have been increasing. Systems run at a clock frequency, which is determined by accounting the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable and aggressive clocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. Such design methodology exploits the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case scenarios. Better-than-worst-case design methodology is advocated by several recent research pursuits, which propose to exploit in-built fault tolerance mechanisms to enhance computer system performance. Recent works have also shown that the performance lose due to over provisioning base on worst-case design margins is upward of 20\% in terms operating frequency and upward of 50\% in terms of power efficiency. The threat of soft error induced system failure in computing systems has become more prominent as we adopt ultra-deep submicron process technologies. With respect to soft error susceptibility, decreasing transistor geometries lower the energy threshold needed by high-energy particles to induce errors. As this trend continues, the need for fault tolerance mechanisms to counteract this effect has moved from a nice to have, to be a requirement in current and future systems. In this dissertation, RAKSHA (meaning to protect and save in Sanskrit), we take a multidimensional look at the challenges of system design built with scaled-technologies using high integrity techniques. In RAKSHA, to mitigate soft errors, we propose lightweight high-integrity mechanisms as basic system building blocks which allow system to offer performance levels comparable to a non-fault tolerant system. In addition, we also propose to effectively exploit and use the availability of fault tolerant mechanisms to allow and tolerate data-dependent failures, thus setting systems to operate at typical case circuit delays and enhance system performance. We also propose the use of novel high-integrity cells for increasing system energy efficiency and also potentially increasing system security by combating power-analysis-based side channel attacks. Such an approach allows balancing of performance, power, and security with no further overhead over the resources needed to incorporate fault tolerance. Using our framework, instead of designing circuits to meet worst-case requirements, circuits can be designed to meet typical-case requirements. In RAKSHA, we propose two efficient soft error mitigation schemes, namely Soft Error Mitigation (SEM) and Soft and Timing Error Mitigation (STEM), using the approach of multiple clocking of data for protecting combinational logic blocks from soft errors. Our first technique, SEM, based on distributed and temporal voting of three registers, unloads the soft error detection overhead from the critical path of the systems. SEM is also capable of ignoring false errors and recovers from soft errors using in-situ fast recovery avoiding recomputation. Our second technique, STEM, while tolerating soft errors, adds timing error detection capability to guarantee reliable execution in aggressively clocked designs that enhance system performance by operating beyond worst-case clock frequency. We also present a specialized low overhead clock phase management scheme that ably supports our proposed techniques. Timing annotated gate level simulations, using 45nm libraries, of a pipelined adder-multiplier and DLX processor show that both our techniques achieve near 100% fault coverage. For DLX processor, even under severe fault injection campaigns, SEM achieves an average performance improvement of 26.58% over a conventional triple modular redundancy voter based soft error mitigation scheme, while STEM outperforms SEM by 27.42%. We refer to systems built with SEM and STEM cells as reliable and aggressive systems. Energy consumption minimization in computing systems has attracted a great deal of attention and has also become critical due to battery life considerations and environmental concerns. To address this problem, many task scheduling algorithms are developed using dynamic voltage and frequency scaling (DVFS). Majority of these algorithms involve two passes: schedule generation and slack reclamation. Using this approach, linear combination of frequencies has been proposed to achieve near optimal energy for systems operating with discrete and traditional voltage frequency pairs. In RAKSHA, we propose a new slack reclamation algorithm, aggressive dynamic and voltage scaling (ADVFS), using reliable and aggressive systems. ADVFS exploits the enhanced voltage frequency spectrum offered by reliable and aggressive designs for improving energy efficiency. Formal proofs are provided to show that optimal energy for reliable and aggressive designs is either achieved by using single frequency or by linear combination of frequencies. ADVFS has been evaluated using random task graphs and our results show 18% reduction in energy when compared with continuous DVFS and over more than 33% when compared with scheme using linear combination of traditional voltage frequency pairs. Recent events have indicated that attackers are banking on side-channel attacks, such as differential power analysis (DPA) and correlation power analysis (CPA), to exploit information leaks from physical devices. Random dynamic voltage frequency scaling (RDVFS) has been proposed to prevent such attacks and has very little area, power, and performance overheads. But due to the one-to-one mapping present between voltage and frequency of DVFS voltage-frequency pairs, RDVFS cannot prevent power attacks. In RAKSHA, we propose a novel countermeasure that uses reliable and aggressive designs to break this one-to-one mapping. Our experiments show that our technique significantly reduces the correlation for the actual key and also reduces the risk of power attacks by increasing the probability for incorrect keys to exhibit maximum correlation. Moreover, our scheme also enables systems to operate beyond the worst-case estimates to offer improved power and performance benefits. For the experiments conducted on AES S-box implemented using 45nm CMOS technology, our approach has increased performance by 22% over the worst-case estimates. Also, it has decreased the correlation for the correct key by an order and has increased the probability by almost 3.5X times for wrong keys when compared with the original key to exhibit maximum correlation. Overall, RAKSHA offers a new way to balance the intricate interplay between various design constraints for the systems designed using small scaled-technologies

Digital Repository @ Iowa State University (ISU)

Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

Author: He Hao
Publication venue
Publication date: 18/01/2019
Field of study

The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods

Texas A&M Repository

Transition Faults and Transition Path Delay Faults: Test Generation, Path Selection, and Built-In Generation of Functional Broadside Tests

Author: Yao Bo
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2013
Field of study

As the clock frequency and complexity of digital integrated circuits increase rapidly, delay testing is indispensable to guarantee the correct timing behavior of the circuits. In this dissertation, we describe methods developed for three aspects of delay testing in scan-based circuits: test generation, path selection and built-in test generation. We first describe a deterministic broadside test generation procedure for a path delay fault model named the transition path delay fault model, which captures both large and small delay defects. Under this fault model, a path delay fault is detected only if all the individual transition faults along the path are detected by the same test. To reduce the complexity of test generation, sub-procedures with low complexity are applied before a complete branch-and-bound procedure. Next, we describe a method based on static timing analysis to select critical paths for test generation. Logic conditions that are necessary for detecting a path delay fault are considered to refine the accuracy of static timing analysis, using input necessary assignments. Input necessary assignments are input values that must be assigned to detect a fault. The method calculates more accurate path delays, selects paths that are critical during test application, and identifies undetectable path delay faults. These two methods are applicable to off-line test generation. For large circuits with high complexity and frequency, built-in test generation is a cost-effective method for delay testing. For a circuit that is embedded in a larger design, we developed a method for built-in generation of functional broadside tests to avoid excessive power dissipation during test application and the overtesting of delay faults, taking the functional constraints on the primary input sequences of the circuit into consideration. Functional broadside tests are scan-based two-pattern tests for delay faults that create functional operation conditions during test application. To avoid the potential fault coverage loss due to the exclusive use of functional broadside tests, we also developed an optional DFT method based on state holding to improve fault coverage. High delay fault coverage can be achieved by the developed method for benchmark circuits using simple hardware

Purdue E-Pubs

Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

Author: Alsheikh Mohammad Abu
Hoang Dinh Thai
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2015
Field of study

Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

arXiv.org e-Print Archive

University of Canberra Research Repository