Search CORE

420 research outputs found

Modeling and optimization of high-performance many-core systems for energy-efficient and reliable computing

Author: Meng Jie
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (Ph.D.)--Boston UniversityMany-core systems, ranging from small-scale many-core processors to large-scale high performance computing (HPC) data centers, have become the main trend in computing system design owing to their potential to deliver higher throughput per watt. However, power densities and temperatures increase following the growth in the performance capacity, and bring major challenges in energy efficiency, cooling costs, and reliability. These challenges require a joint assessment of performance, power, and temperature tradeoffs as well as the design of runtime optimization techniques that monitor and manage the interplay among them. This thesis proposes novel modeling and runtime management techniques that evaluate and optimize the performance, energy, and reliability of many-core systems. We first address the energy and thermal challenges in 3D-stacked many-core processors. 3D processors with stacked DRAM have the potential to dramatically improve performance owing to lower memory access latency and higher bandwidth. However, the performance increase may cause 3D systems to exceed the power budgets or create thermal hot spots. In order to provide an accurate analysis and enable the design of efficient management policies, this thesis introduces a simulation framework to jointly analyze performance, power, and temperature for 3D systems. We then propose a runtime optimization policy that maximizes the system performance by characterizing the application behavior and predicting the operating points that satisfy the power and thermal constraints. Our policy reduces the energy-delay product (EDP) by up to 61.9% compared to existing strategies. Performance, cooling energy, and reliability are also critical aspects in HPC data centers. In addition to causing reliability degradation, high temperatures increase the required cooling energy. Communication cost, on the other hand, has a significant impact on system performance in HPC data centers. This thesis proposes a topology-aware technique that maximizes system reliability by selecting between workload clustering and balancing. Our policy improves the system reliability by up to 123.3% compared to existing temperature balancing approaches. We also introduce a job allocation methodology to simultaneously optimize the communication cost and the cooling energy in a data center. Our policy reduces the cooling cost by 40% compared to cooling-aware and performance-aware policies, while achieving comparable performance to performance-aware policy

Boston University Institutional Repository (OpenBU)

Thermal Aware Design Automation of the Electronic Control System for Autonomous Vehicles

Author: Bankar Ajinkya
Publication venue: FIU Digital Commons
Publication date: 28/03/2022
Field of study

The autonomous vehicle (AV) technology, due to its tremendous social and economical benefits, is transforming the entire world in the coming decades. However, significant technical challenges still need to be overcome until AVs can be safely, reliably, and massively deployed. Temperature plays a key role in the safety and reliability of an AV, not only because a vehicle is subjected to extreme operating temperatures but also because the increasing computations demand more powerful IC chips, which can lead to higher operating temperature and large thermal gradient. In particular, as the underpinning technology for AV, artificial intelligence (AI) requires substantially increased computation and memory resources, which have been growing exponentially through recent years and further exacerbated the thermal problems. High operating temperature and large thermal gradient can reduce the performance, degrade the reliability, and even cause an IC to fail catastrophically. We believe that dealing with thermal issues must be coupled closely in the design phase of the AVs’ electronic control system (ECS). To this end, first, we study how to map vehicle applications to ECS with heterogeneous architecture to satisfy peak temperature constraints and optimize latency and system-level reliability. We present a mathematical programming model to bound the peak temperature for the ECS. We also develop an approach based on the genetic algorithm to bound the peak temperature under varying execution time scenarios and optimize the system-level reliability of the ECS. We present several computationally efficient techniques for system-level mean-time-to-failure (MTTF) computation, which show several orders-of-magnitude speed-up over the state-of-the-art method. Second, we focus on studying the thermal impacts of AI techniques. Specifically, we study how the thermal impacts for the memory bit flipping can affect the prediction accuracy of a deep neural network (DNN). We develop a neuron-level analytical sensitivity estimation framework to quantify this impact and study its effectiveness with popular DNN architectures. Third, we study the problem of incorporating thermal impacts into mapping the parameters for DNN neurons to memory banks to improve prediction accuracy. Based on our developed sensitivity metric, we develop a bin-packing-based approach to map DNN neuron parameters to memory banks with different temperature profiles. We also study the problem of identifying the optimal temperature profiles for memory systems that can minimize the thermal impacts. We show that the thermal aware mapping of DNN neuron parameters on memory banks can significantly improve the prediction accuracy at a high-temperature range than the thermal ignorant for state-of-the-art DNNs

DigitalCommons@Florida International University

A Test-Ordering Based Temperature-Cycling Acceleration Technique for 3D Stacked ICs

Author: C Okoro
M Musallam
Nima Aghaee
P Girard
P Kumar
Petru Eles
PM Rosinger
R Poli
S Bahukudumbi
T-M Lin
V Dabholkar
Zebo Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Design Space Exploration and Resource Management of Multi/Many-Core Systems

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

Directory of Open Access Books (DOAB)

Development, validation, and testing of advanced mathematical models for the optimization of BESS operation

Author: Bovera Filippo
Merlo Marco
Rancilio Giuliano
Spiller Matteo
Zatti Matteo
Publication venue
Publication date: 01/01/2023
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Massive Data-Centric Parallelism in the Chiplet Era

Author: Martonosi Margaret
Orenes-Vera Marcelo
Tureci Esin
Wentzlaf David
Publication venue
Publication date: 18/04/2023
Field of study

Traditionally, massively parallel applications are executed on distributed systems, where computing nodes are distant enough that the parallelization schemes must minimize communication and synchronization to achieve scalability. Mapping communication-intensive workloads to distributed systems requires complicated problem partitioning and dataset pre-processing. With the current AI-driven trend of having thousands of interconnected processors per chip, there is an opportunity to re-think these communication-bottlenecked workloads. This bottleneck often arises from data structure traversals, which cause irregular memory accesses and poor cache locality. Recent works have introduced task-based parallelization schemes to accelerate graph traversal and other sparse workloads. Data structure traversals are split into tasks and pipelined across processing units (PUs). Dalorex demonstrated the highest scalability (up to thousands of PUs on a single chip) by having the entire dataset on-chip, scattered across PUs, and executing the tasks at the PU where the data is local. However, it also raised questions on how to scale to larger datasets when all the memory is on chip, and at what cost. To address these challenges, we propose a scalable architecture composed of a grid of Data-Centric Reconfigurable Array (DCRA) chiplets. Package-time reconfiguration enables creating chip products that optimize for different target metrics, such as time-to-solution, energy, or cost, while software reconfigurations avoid network saturation when scaling to millions of PUs across many chip packages. We evaluate six applications and four datasets, with several configurations and memory technologies, to provide a detailed analysis of the performance, power, and cost of data-local execution at scale. Our parallelization of Breadth-First-Search with RMAT-26 across a million PUs reaches 3323 GTEPS

arXiv.org e-Print Archive

Network-on-Chip

Author: Chattopadhyay Santanu
Kundu Santanu
Publication venue: 'Informa UK Limited'
Publication date: 10/02/2021
Field of study

Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

Directory of Open Access Books (DOAB)

Design-for-Test and Test Optimization Techniques for TSV-based 3D Stacked ICs

Author: Noia Brandon Robert
Publication venue
Publication date: 01/01/2014
Field of study

As integrated circuits (ICs) continue to scale to smaller dimensions, long interconnectshave become the dominant contributor to circuit delay and a significant component ofpower consumption. In order to reduce the length of these interconnects, 3D integrationand 3D stacked ICs (3D SICs) are active areas of research in both academia and industry.3D SICs not only have the potential to reduce average interconnect length and alleviatemany of the problems caused by long global interconnects, but they can offer greater designflexibility over 2D ICs, significant reductions in power consumption and footprint inan era of mobile applications, increased on-chip data bandwidth through delay reduction,and improved heterogeneous integration.Compared to 2D ICs, the manufacture and test of 3D ICs is significantly more complex.Through-silicon vias (TSVs), which constitute the dense vertical interconnects in adie stack, are a source of additional and unique defects not seen before in ICs. At the sametime, testing these TSVs, especially before die stacking, is recognized as a major challenge.The testing of a 3D stack is constrained by limited test access, test pin availability,power, and thermal constraints. Therefore, efficient and optimized test architectures areneeded to ensure that pre-bond, partial, and complete stack testing are not prohibitivelyexpensive.Methods of testing TSVs prior to bonding continue to be a difficult problem due to testaccess and testability issues. Although some built-in self-test (BIST) techniques have beenproposed, these techniques have numerous drawbacks that render them impractical. In this dissertation, a low-cost test architecture is introduced to enable pre-bond TSV test throughTSV probing. This has the benefit of not needing large analog test components on the die,which is a significant drawback of many BIST architectures. Coupled with an optimizationmethod described in this dissertation to create parallel test groups for TSVs, test time forpre-bond TSV tests can be significantly reduced. The pre-bond probing methodology isexpanded upon to allow for pre-bond scan test as well, to enable both pre-bond TSV andstructural test to bring pre-bond known-good-die (KGD) test under a single test paradigm.The addition of boundary registers on functional TSV paths required for pre-bondprobing results in an increase in delay on inter-die functional paths. This cost of testarchitecture insertion can be a significant drawback, especially considering that one benefitof 3D integration is that critical paths can be partitioned between dies to reduce their delay.This dissertation derives a retiming flow that is used to recover the additional delay addedto TSV paths by test cell insertion.Reducing the cost of test for 3D-SICs is crucial considering that more tests are necessaryduring 3D-SIC manufacturing. To reduce test cost, the test architecture and testscheduling for the stack must be optimized to reduce test time across all necessary testinsertions. This dissertation examines three paradigms for 3D integration - hard dies, firmdies, and soft dies, that give varying degrees of control over 2D test architectures on eachdie while optimizing the 3D test architecture. Integer linear programming models are developedto provide an optimal 3D test architecture and test schedule for the dies in the 3Dstack considering any or all post-bond test insertions. Results show that the ILP modelsoutperform other optimization methods across a range of 3D benchmark circuits.In summary, this dissertation targets testing and design-for-test (DFT) of 3D SICs.The proposed techniques enable pre-bond TSV and structural test while maintaining arelatively low test cost. Future work will continue to enable testing of 3D SICs to moveindustry closer to realizing the true potential of 3D integration.Dissertatio

DukeSpace

Thermal aware design techniques for multiprocessor architectures in three dimensions

Author: Cuesta Gómez David
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 20/03/2014
Field of study

Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 28-11-2013Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

Docta Complutense