Search CORE

3 research outputs found

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

Author: Benini Luca
Besta Maciej
Cavalcante Matheus
Fischer Tim
Hoefler Torsten
Iff Patrick
Publication venue
Publication date: 08/10/2023
Field of study

2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of providing high-performance inter-chiplet interconnects (ICIs). As the number of chiplets grows to tens or hundreds, it becomes infeasible to hand-optimize their arrangement in a way that maximizes the ICI performance. In this paper, we propose HexaMesh, an arrangement of chiplets that outperforms a grid arrangement both in theory (network diameter reduced by 42%; bisection bandwidth improved by 130%) and in practice (latency reduced by 19%; throughput improved by 34%). MexaMesh enables large-scale chiplet designs with high-performance ICIs

arXiv.org e-Print Archive

Massive Data-Centric Parallelism in the Chiplet Era

Author: Martonosi Margaret
Orenes-Vera Marcelo
Tureci Esin
Wentzlaf David
Publication venue
Publication date: 18/04/2023
Field of study

Traditionally, massively parallel applications are executed on distributed systems, where computing nodes are distant enough that the parallelization schemes must minimize communication and synchronization to achieve scalability. Mapping communication-intensive workloads to distributed systems requires complicated problem partitioning and dataset pre-processing. With the current AI-driven trend of having thousands of interconnected processors per chip, there is an opportunity to re-think these communication-bottlenecked workloads. This bottleneck often arises from data structure traversals, which cause irregular memory accesses and poor cache locality. Recent works have introduced task-based parallelization schemes to accelerate graph traversal and other sparse workloads. Data structure traversals are split into tasks and pipelined across processing units (PUs). Dalorex demonstrated the highest scalability (up to thousands of PUs on a single chip) by having the entire dataset on-chip, scattered across PUs, and executing the tasks at the PU where the data is local. However, it also raised questions on how to scale to larger datasets when all the memory is on chip, and at what cost. To address these challenges, we propose a scalable architecture composed of a grid of Data-Centric Reconfigurable Array (DCRA) chiplets. Package-time reconfiguration enables creating chip products that optimize for different target metrics, such as time-to-solution, energy, or cost, while software reconfigurations avoid network saturation when scaling to millions of PUs across many chip packages. We evaluate six applications and four datasets, with several configurations and memory technologies, to provide a detailed analysis of the performance, power, and cost of data-local execution at scale. Our parallelization of Breadth-First-Search with RMAT-26 across a million PUs reaches 3323 GTEPS

arXiv.org e-Print Archive

ToSHI - Towards Secure Heterogeneous Integration: Security Risks, Threat Assessment, and Assurance

Author: Amit Mazumder Shuo
Azim Uddin
Fahim Rahman
Farimah Farahmandi
Mark Tehranipoor
Md Latifur Rahman
Md Saad Ul Haque
Md Sami Ul Islam Sami
Navid Asadizanjani
Nidish Vashistha
Paul Calzada
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 01/08/2022
Field of study

The semiconductor industry is entering a new age in which device scaling and cost reduction will no longer follow the decades-long pattern. Packing more transistors on a monolithic IC at each node becomes more difficult and expensive. Companies in the semiconductor industry are increasingly seeking technological solutions to close the gap and enhance cost-performance while providing more functionality through integration. Putting all of the operations on a single chip (known as a system on a chip, or SoC) presents several issues, including increased prices and greater design complexity. Heterogeneous integration (HI), which uses advanced packaging technology to merge components that might be designed and manufactured independently using the best process technology, is an attractive alternative. However, although the industry is motivated to move towards HI, many design and security challenges must be addressed. This paper presents a three-tier security approach for secure heterogeneous integration by investigating supply chain security risks, threats, and vulnerabilities at the chiplet, interposer, and system-in-package levels. Furthermore, various possible trust validation methods and attack mitigation were proposed for every level of heterogeneous integration. Finally, we shared our vision as a roadmap toward developing security solutions for a secure heterogeneous integration

Cryptology ePrint Archive