29 research outputs found
Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines
Rigid body dynamics is a key technology in the robotics field. In trajectory
optimization and model predictive control algorithms, there are usually a large
number of rigid body dynamics computing tasks. Using CPUs to process these
tasks consumes a lot of time, which will affect the real-time performance of
robots. To this end, we propose a multifunctional robot rigid body dynamics
accelerator, named RBDCore, to address the performance bottleneck. By analyzing
different functions commonly used in robot dynamics calculations, we summarize
their reuse relationship and optimize them according to the hardware. Based on
this, RBDCore can fully reuse common hardware modules when processing different
computing tasks. By dynamically switching the dataflow path, RBDCore can
accelerate various dynamics functions without reconfiguring the hardware. We
design Structure-Adaptive Pipelines for RBDCore, which can greatly improve the
throughput of the accelerator. Robots with different structures and parameters
can be optimized specifically. Compared with the state-of-the-art CPU, GPU
dynamics libraries and FPGA accelerator, RBDCore can significantly improve the
performance
PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators
Processing-in-memory (PIM) has shown extraordinary potential in accelerating
neural networks. To evaluate the performance of PIM accelerators, we present an
ISA-based simulation framework including a dedicated ISA targeting neural
networks running on PIM architectures, a compiler, and a cycleaccurate
configurable simulator. Compared with prior works, this work decouples software
algorithms and hardware architectures through the proposed ISA, providing a
more convenient way to evaluate the effectiveness of software/hardware
optimizations. The simulator adopts an event-driven simulation approach and has
better support for hardware parallelism. The framework is open-sourced at
https://github.com/wangxy-2000/pimsim-nn
PIMSYN: Synthesizing Processing-in-memory CNN Accelerators
Processing-in-memory architectures have been regarded as a promising solution
for CNN acceleration. Existing PIM accelerator designs rely heavily on the
experience of experts and require significant manual design overhead. Manual
design cannot effectively optimize and explore architecture implementations. In
this work, we develop an automatic framework PIMSYN for synthesizing PIM-based
CNN accelerators, which greatly facilitates architecture design and helps
generate energyefficient accelerators. PIMSYN can automatically transform CNN
applications into execution workflows and hardware construction of PIM
accelerators. To systematically optimize the architecture, we embed an
architectural exploration flow into the synthesis framework, providing a more
comprehensive design space. Experiments demonstrate that PIMSYN improves the
power efficiency by several times compared with existing works. PIMSYN can be
obtained from https://github.com/lixixi-jook/PIMSYN-NN
ChipGPT: How far are we from natural language hardware design
As large language models (LLMs) like ChatGPT exhibited unprecedented machine
intelligence, it also shows great performance in assisting hardware engineers
to realize higher-efficiency logic design via natural language interaction. To
estimate the potential of the hardware design process assisted by LLMs, this
work attempts to demonstrate an automated design environment that explores LLMs
to generate hardware logic designs from natural language specifications. To
realize a more accessible and efficient chip development flow, we present a
scalable four-stage zero-code logic design framework based on LLMs without
retraining or finetuning. At first, the demo, ChipGPT, begins by generating
prompts for the LLM, which then produces initial Verilog programs. Second, an
output manager corrects and optimizes these programs before collecting them
into the final design space. Eventually, ChipGPT will search through this space
to select the optimal design under the target metrics. The evaluation sheds
some light on whether LLMs can generate correct and complete hardware logic
designs described by natural language for some specifications. It is shown that
ChipGPT improves programmability, and controllability, and shows broader design
optimization space compared to prior work and native LLMs alone
A New Post-Silicon Debug Approach Based on Suspect Window
Abstract —Bugs are tending to be unavoidable in the design of complex integrated circuits. It is imperative to identify the bugs as soon as possible by post-silicon debug. The main challenge for post-silicon debug is the observability of the internal signals. This paper exploits the fact that it is not necessary to observe the error free states. Then we introduce "suspect window " and present a method for determining its boundary. Based on suspect window, we propose a debug approach to achieve high observability by reusing scan chain. Since scan dumps take place only in suspect window, debug time is greatly reduced. Experimental results demonstrate the effectiveness of the proposed approach. Keywords-post-silicon debug; suspect window; trace; scan; bug I
Vertical interconnects squeezing in symmetric 3D mesh Network-on-Chip
Abstract — Three-dimensional (3D) integration and Network-on-Chip (NoC) are both proposed to tackle the on-chip intercon-nect scaling problems, and extensive research efforts have been de-voted to the design challenges of combining both. Through-silicon via (TSV) is considered to be the most promising technology for 3D integration, however, TSV pads distributed across planar lay-ers occupy significant chip area and result in routing congestions. In addition, the yield of 3D integrated circuits decreased dramat-ically as the number of TSVs increases. For symmetric 3D mesh NoC, we observe that the TSVs ’ utilization is pretty low and adja-cent routers rarely transmit packets via their vertical channels (i.e. TSVs) at the same time. Based on this observation, we propose a novel TSV squeezing scheme to share TSVs among neighboring router in a time division multiplex mode, which greatly improves the utilization of TSVs. Experimental results show that the pro-posed method can save significant TSV footprint with negligible performance overhead.
Compression/Scan Co-Design for Reducing Test Data Volume, Scan-in Power Dissipation, and Test Application Time
LSI testing is critical to guarantee chips are fault-free before they are integrated in a system, so as to increase the reliability of the system. Although full-scan is a widely adopted design-for-testability technique for LSI design and testing, there is a strong need to reduce the test data Volume, scan-in Power dissipation, and test application Time (VPT) of full-scan testing. Based on the analysis of the characteristics of the variable-to-fixed run-length coding technique and the random access scan architecture, this paper presents a novel design scheme to tackle all VPT issues simultaneously. Experimental results on ISCAS\u2789 benchmarks have shown on average 51.2%, 99.5%, 99.3%, and 85.5% reduction effects in test data volume, average scan-in power dissipation, peak scan-in power dissipation, and test application time, respectively