Search CORE

119 research outputs found

Combining dynamic and static scheduling in high-level synthesis

Author: Cheng Jianyi
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/07/2023
Field of study

Field Programmable Gate Arrays (FPGAs) are starting to become mainstream devices for custom computing, particularly deployed in data centres. However, using these FPGA devices requires familiarity with digital design at a low abstraction level. In order to enable software engineers without a hardware background to design custom hardware, high-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A central task in HLS is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile time, but recent years have seen the emergence of dynamic scheduling, in which an operation’s clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has a non-trivial control flow. This thesis proposes a scheduling approach that combines the best of both worlds. My idea is to use existing program analysis techniques in software designs, such as probabilistic analysis and formal verification, to optimize the HLS hardware. First, this thesis proposes a tool named DASS that uses a heuristic-based approach to identify the code regions in the input program that are amenable to static scheduling and synthesises them into statically scheduled components, also known as static islands, leaving the top-level hardware dynamically scheduled. Second, this thesis addresses a problem of this approach: that the analysis of static islands and their dynamically scheduled surroundings are separate, where one treats the other as black boxes. We apply static analysis including dependence analysis between static islands and their dynamically scheduled surroundings to optimize the offsets of static islands for high performance. We also apply probabilistic analysis to estimate the performance of the dynamically scheduled part and use this information to optimize the static islands for high area efficiency. Finally, this thesis addresses the problem of conservatism in using sequential control flow designs which can limit the throughput of the hardware. We show this challenge can be solved by formally proving that certain control flows can be safely parallelised for high performance. This thesis demonstrates how to use automated formal verification to find out-of-order loop pipelining solutions and multi-threading solutions from a sequential program.Open Acces

Spiral - Imperial College Digital Repository

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Author: Cheng Jianyi
Constantinides George A.
Shumailov Ilia
Zhang Cheng
Zhao Yiren
Publication venue
Publication date: 21/10/2023
Field of study

The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address this, we adapt block quantisations for LLMs, a family of methods that share scaling factors across packed numbers. Block quantisations efficiently reduce the numerical scaling offsets solely from an arithmetic perspective, without additional treatments in the computational path. Our nearly-lossless quantised 6-bit LLMs achieve a

19\times

higher arithmetic density and

5\times

memory density than the float32 baseline, surpassing the prior art 8-bit quantisation by

2.5\times

in arithmetic density and

1.2\times

in memory density, without requiring any data calibration or re-training. We also share our insights into sub-8-bit LLM quantisation, including the mismatch between activation and weight distributions, optimal fine-tuning strategies, and a lower quantisation granularity inherent in the statistical properties of LLMs. The latter two tricks enable nearly-lossless 4-bit LLMs on downstream tasks. Our code is open-sourced.Comment: Accepted by EMNLP202

arXiv.org e-Print Archive

GSA to HDL: Towards principled generation of dynamically scheduled circuits

Author: Cheng Jianyi
Herklotz Yann
Rajagopal Aditya
Vink Diederik Adriaan
Publication venue
Publication date: 21/08/2023
Field of study

High-level synthesis (HLS) refers to the automatic translation of a software program written in a high-level language into a hardware design. Modern HLS tools have moved away from the traditional approach of static (compile time) scheduling of operations to generating dynamic circuits that schedule operations at run time. Such circuits trade-off area utilisation for increased dynamism and throughput. However, existing lowering flows in dynamically scheduled HLS tools rely on conservative assumptions on their input program due to both the intermediate representations (IR) utilised as well as the lack of formal specifications on the translation into hardware. These assumptions cause suboptimal hardware performance. In this work, we lift these assumptions by proposing a new and efficient abstraction for hardware mapping; namely h-GSA, an extension of the Gated Single Static Assignment (GSA) IR. Using this abstraction, we propose a lowering flow that transforms GSA into h-GSA and maps h-GSA into dynamically scheduled hardware circuits. We compare the schedules generated by our approach to those by the state-of-the-art dynamic-scheduling HLS tool, Dynamatic, and illustrate the potential performance improvement from hardware mapping using the proposed abstraction.Comment: Presented at the 19th International Summer School on Advanced Computer Architecture and Compilation for High-performance Embedded Systems (ACACES 2023

arXiv.org e-Print Archive

SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR

Author: Barbalho Rafael
Chelini Lorenzo
Cheng Jianyi
Coward Samuel
Drane Theo
Publication venue
Publication date: 15/08/2023
Field of study

High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. However, the hardware designs produced by HLS tools still suffer from a significant performance gap compared to manual implementations. This is because the input HLS programs must still be written using hardware design principles. Existing techniques either leave the program source unchanged or perform a fixed sequence of source transformation passes, potentially missing opportunities to find the optimal design. We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into efficient HLS code that can be used to generate an optimized hardware design. We developed a toolflow named SEER, based on the e-graph data structure, to efficiently explore equivalent implementations of a program at scale. SEER provides an extensible framework, orchestrating existing software compiler passes and hardware synthesis optimizers. Our work is the first attempt to exploit e-graph rewriting for large software compiler frameworks, such as MLIR. Across a set of open-source benchmarks, we show that SEER achieves up to 38x the performance within 1.4x the area of the original program. Via an Intel-provided case study, SEER demonstrates the potential to outperform manually optimized designs produced by hardware experts

arXiv.org e-Print Archive

Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration

Author: Bouganis Christos-Savvas
Cheng Jianyi
Montgomerie-Corcoran Alex
Xiao Can
Yu Zhewen
Zhang Cheng
Zhao Yiren
Publication venue
Publication date: 28/07/2023
Field of study

Machine learning (ML) accelerators have been studied and used extensively to compute ML models with high performance and low power. However, designing such accelerators normally takes a long time and requires significant effort. Unfortunately, the pace of development of ML software models is much faster than the accelerator design cycle, leading to frequent and drastic modifications in the model architecture, thus rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that can fit into a single hardware device, such as an FPGA. Furthermore, with the emergence of large language models, such as GPT-3, there is an increased need for hardware prototyping of these large models within a many-accelerator system to ensure the hardware can scale with the ever-growing model sizes. In this paper, we propose an efficient and scalable approach for exploring accelerator systems to compute large ML models. We developed a tool named MASE that can directly map large ML models onto an efficient streaming accelerator system. Over a set of ML models, we show that MASE can achieve better energy efficiency to GPUs when computing inference for recent transformer models. Our tool will open-sourced upon publication

arXiv.org e-Print Archive

Non-Hermitian topological whispering gallery

Author: Cheng Ying
Christensen Johan
Hu Bolun
Liu Xiaojun
Wang Xiaoyu
Xiong Wei
Xu Jianyi
Yue Zichong
Zhang Haixiao
Zhang Zhiwang
Zheng Li-Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/09/2021
Field of study

In 1878, Lord Rayleigh observed the highly celebrated phenomenon of sound waves that creep around the curved gallery of St Paul's Cathedral in London1,2. These whispering-gallery waves scatter efficiently with little diffraction around an enclosure and have since found applications in ultrasonic fatigue and crack testing, and in the optical sensing of nanoparticles or molecules using silica microscale toroids. Recently, intense research efforts have focused on exploring non-Hermitian systems with cleverly matched gain and loss, facilitating unidirectional invisibility and exotic characteristics of exceptional points3,4. Likewise, the surge in physics using topological insulators comprising non-trivial symmetry-protected phases has laid the groundwork in reshaping highly unconventional avenues for robust and reflection-free guiding and steering of both sound and light5,6. Here we construct a topological gallery insulator using sonic crystals made of thermoplastic rods that are decorated with carbon nanotube films, which act as a sonic gain medium by virtue of electro-thermoacoustic coupling. By engineering specific non-Hermiticity textures to the activated rods, we are able to break the chiral symmetry of the whispering-gallery modes, which enables the out-coupling of topological "audio lasing" modes with the desired handedness. We foresee that these findings will stimulate progress in non-destructive testing and acoustic sensing.This work was supported by the National Basic Research Program of China (2017YFA0303702), NSFC (12074183, 11922407, 11904035, 11834008, 11874215 and 12104226) and the Fundamental Research Funds for the Central Universities (020414380181). Z.Z. acknowledges the support from the China National Postdoctoral Program for Innovative Talents (BX20200165) and the China Postdoctoral Science Foundation (2020M681541). L.Z. acknowledges support from the CONEX-Plus programme funded by Universidad Carlos III de Madrid and the European Union's Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement 801538. J.C. acknowledges support from the European Research Council (ERC) through the Starting Grant 714577 PHONOMETA and from the MINECO through a Ramón y Cajal grant (grant number RYC-2015-17156)

Universidad Carlos III de Madrid e-Archivo

Research on Temperature Field and Stress Field of Prefabricate Block Electric Furnace Roof

Author: Fuwei Cheng
Gongfa Li
Guozhang Jiang
Jia Liu
Jianyi Kong
Shaoyang Shi
Tao He
Wentao Xiao
Yikun Zhang
Publication venue: IFSA Publishing, S.L.
Publication date: 01/11/2013
Field of study

This paper establishes the CAD/CAE model of high aluminum brick furnace cover and a precast furnace cover (casting three block, eight block, twelve block) based on a 30t electric furnace roof real model of a steel factory and simulates the temperature and stress field of the firebrick roof and prefabricate block roof with ANSYS. The calculation results have indicated that the contact stress between furnace cover and precast block will affect the performance of the furnace cover and the furnace cover which is assembled by three pieces of casting precast block obtains lower stress levels has a longer service life, providing a quantitative reference for selection of casting scheme

Directory of Open Access Journals

Influence Factors on Stress Distribution of Electric Furnace Roof

Author: Fu Wei Cheng
Gongfa Li
Guozhang Jiang
Jia Liu
Jianyi Kong
Shao Yang Shi
Tao He
Wentao Xiao
Yikun Zhang
Publication venue: IFSA Publishing, S.L.
Publication date: 01/11/2013
Field of study

Electric furnace roof is an important device for electric steel making, whose heat preservation performance and life-span have a direct impact on the economic benefits of iron and steel enterprise. This paper investigates the effect between the stress level of electric furnace roof and the material parameters. Research indicates that they have a trend to change in the same direction

Directory of Open Access Journals

China’s 10-year progress in DC gas-insulated equipment: From basic research to industry perspective

Author: Bo Liu
Changhong Zhang
Cheng Pan
Chuanyang Li
Davide Fabiani
Fangwei Liang
Geng Chen
Giovanni Mazzanti
Hucheng Liang
Jianyi Xue
Jianying Zhong
Jingen Tang
Jinliang He
Jinzhuang Lv
Lei Zhang
Peng Liu
Shaohua Cao
Uwe Riechert
Weijian Zhuang
Wu Lu
Xianhao Fan
Yuan Deng
Zheming Wang
Zhen Li
Zhenle Nan
Zijun Pan
Zuodong Liang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The construction of the future energy structure of China under the 2050 carbon-neutral vision requires compact direct current (DC) gas-insulation equipment as important nodes and solutions to support electric power transmission and distribution of long-distance and large-capacity. This paper reviews China's 10-year progress in DC gas-insulated equipment. Important progresses in basic research and industry perspective are presented, with related scientific issues and technical bottlenecks being discussed. The progress in DC gas-insulated equipment worldwide (Europe, Japan, America) is also reported briefly

Directory of Open Access Journals

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna