Search CORE

29 research outputs found

Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines

Author: Chen Xiaoming
Han Yinhe
Yang Yuxin
Publication venue
Publication date: 28/09/2023
Field of study

Rigid body dynamics is a key technology in the robotics field. In trajectory optimization and model predictive control algorithms, there are usually a large number of rigid body dynamics computing tasks. Using CPUs to process these tasks consumes a lot of time, which will affect the real-time performance of robots. To this end, we propose a multifunctional robot rigid body dynamics accelerator, named RBDCore, to address the performance bottleneck. By analyzing different functions commonly used in robot dynamics calculations, we summarize their reuse relationship and optimize them according to the hardware. Based on this, RBDCore can fully reuse common hardware modules when processing different computing tasks. By dynamically switching the dataflow path, RBDCore can accelerate various dynamics functions without reconfiguring the hardware. We design Structure-Adaptive Pipelines for RBDCore, which can greatly improve the throughput of the accelerator. Robots with different structures and parameters can be optimized specifically. Compared with the state-of-the-art CPU, GPU dynamics libraries and FPGA accelerator, RBDCore can significantly improve the performance

arXiv.org e-Print Archive

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

Author: Chen Xiaoming
Han Yinhe
Sun Xiaotian
Wang Xinyu
Publication venue
Publication date: 28/02/2024
Field of study

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks running on PIM architectures, a compiler, and a cycleaccurate configurable simulator. Compared with prior works, this work decouples software algorithms and hardware architectures through the proposed ISA, providing a more convenient way to evaluate the effectiveness of software/hardware optimizations. The simulator adopts an event-driven simulation approach and has better support for hardware parallelism. The framework is open-sourced at https://github.com/wangxy-2000/pimsim-nn

arXiv.org e-Print Archive

Frequency Analysis Method for Propagation of Transient Errors in Combinational Logic

Author: Shaohua Lei
Xiaowei Li
Yinhe Han
Publication venue
Publication date: 24/04/2020
Field of study

CiteSeerX

PIMSYN: Synthesizing Processing-in-memory CNN Accelerators

Author: Chen Xiaoming
Han Yinhe
Li Wanqian
Sun Xiaotian
Wang Lei
Wang Xinyu
Publication venue
Publication date: 28/02/2024
Field of study

Processing-in-memory architectures have been regarded as a promising solution for CNN acceleration. Existing PIM accelerator designs rely heavily on the experience of experts and require significant manual design overhead. Manual design cannot effectively optimize and explore architecture implementations. In this work, we develop an automatic framework PIMSYN for synthesizing PIM-based CNN accelerators, which greatly facilitates architecture design and helps generate energyefficient accelerators. PIMSYN can automatically transform CNN applications into execution workflows and hardware construction of PIM accelerators. To systematically optimize the architecture, we embed an architectural exploration flow into the synthesis framework, providing a more comprehensive design space. Experiments demonstrate that PIMSYN improves the power efficiency by several times compared with existing works. PIMSYN can be obtained from https://github.com/lixixi-jook/PIMSYN-NN

arXiv.org e-Print Archive

ChipGPT: How far are we from natural language hardware design

Author: Chang Kaiyan
Han Yinhe
Li Huawei
Li Xiaowei
Liang Shengwen
Ren Haimeng
Wang Mengdi
Wang Ying
Publication venue
Publication date: 19/06/2023
Field of study

As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction. To estimate the potential of the hardware design process assisted by LLMs, this work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications. To realize a more accessible and efficient chip development flow, we present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning. At first, the demo, ChipGPT, begins by generating prompts for the LLM, which then produces initial Verilog programs. Second, an output manager corrects and optimizes these programs before collecting them into the final design space. Eventually, ChipGPT will search through this space to select the optimal design under the target metrics. The evaluation sheds some light on whether LLMs can generate correct and complete hardware logic designs described by natural language for some specifications. It is shown that ChipGPT improves programmability, and controllability, and shows broader design optimization space compared to prior work and native LLMs alone

arXiv.org e-Print Archive

A New Post-Silicon Debug Approach Based on Suspect Window

Author: Jianliang Gao
Xiaowei Li
Yinhe Han
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract —Bugs are tending to be unavoidable in the design of complex integrated circuits. It is imperative to identify the bugs as soon as possible by post-silicon debug. The main challenge for post-silicon debug is the observability of the internal signals. This paper exploits the fact that it is not necessary to observe the error free states. Then we introduce "suspect window " and present a method for determining its boundary. Based on suspect window, we propose a debug approach to achieve high observability by reusing scan chain. Since scan dumps take place only in suspect window, debug time is greatly reduced. Experimental results demonstrate the effectiveness of the proposed approach. Keywords-post-silicon debug; suspect window; trace; scan; bug I

CiteSeerX

Crossref

Vertical interconnects squeezing in symmetric 3D mesh Network-on-Chip

Author: Cheng Liu
Lei Zhang
Xiaowei Li
Yinhe Han
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract — Three-dimensional (3D) integration and Network-on-Chip (NoC) are both proposed to tackle the on-chip intercon-nect scaling problems, and extensive research efforts have been de-voted to the design challenges of combining both. Through-silicon via (TSV) is considered to be the most promising technology for 3D integration, however, TSV pads distributed across planar lay-ers occupy significant chip area and result in routing congestions. In addition, the yield of 3D integrated circuits decreased dramat-ically as the number of TSVs increases. For symmetric 3D mesh NoC, we observe that the TSVs ’ utilization is pretty low and adja-cent routers rarely transmit packets via their vertical channels (i.e. TSVs) at the same time. Based on this observation, we propose a novel TSV squeezing scheme to share TSVs among neighboring router in a time division multiplex mode, which greatly improves the utilization of TSVs. Experimental results show that the pro-posed method can save significant TSV footprint with negligible performance overhead.

CiteSeerX

Crossref

Compression/Scan Co-Design for Reducing Test Data Volume, Scan-in Power Dissipation, and Test Application Time

Author: Han Yinhe
Hu Yu
Li Huawei
Li Xiaowei
Wen Xiaoqing
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2006
Field of study

LSI testing is critical to guarantee chips are fault-free before they are integrated in a system, so as to increase the reliability of the system. Although full-scan is a widely adopted design-for-testability technique for LSI design and testing, there is a strong need to reduce the test data Volume, scan-in Power dissipation, and test application Time (VPT) of full-scan testing. Based on the analysis of the characteristics of the variable-to-fixed run-length coding technique and the random access scan architecture, this paper presents a novel design scheme to tackle all VPT issues simultaneously. Experimental results on ISCAS\u2789 benchmarks have shown on average 51.2%, 99.5%, 99.3%, and 85.5% reduction effects in test data volume, average scan-in power dissipation, peak scan-in power dissipation, and test application time, respectively

Kyutacar : Kyushu Institute of Technology Academic Repository

Retention-Aware DRAM Assembly and Repair for Future FGR Memories

Author: Cheng Wang
Huawei Li
Xiaowei Li
Ying Wang
Yinhe Han
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Crossref