922 research outputs found

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    Energy challenges for ICT

    Get PDF
    The energy consumption from the expanding use of information and communications technology (ICT) is unsustainable with present drivers, and it will impact heavily on the future climate change. However, ICT devices have the potential to contribute signi - cantly to the reduction of CO2 emission and enhance resource e ciency in other sectors, e.g., transportation (through intelligent transportation and advanced driver assistance systems and self-driving vehicles), heating (through smart building control), and manu- facturing (through digital automation based on smart autonomous sensors). To address the energy sustainability of ICT and capture the full potential of ICT in resource e - ciency, a multidisciplinary ICT-energy community needs to be brought together cover- ing devices, microarchitectures, ultra large-scale integration (ULSI), high-performance computing (HPC), energy harvesting, energy storage, system design, embedded sys- tems, e cient electronics, static analysis, and computation. In this chapter, we introduce challenges and opportunities in this emerging eld and a common framework to strive towards energy-sustainable ICT

    Field-based branch prediction for packet processing engines

    Get PDF
    Network processors have exploited many aspects of architecture design, such as employing multi-core, multi-threading and hardware accelerator, to support both the ever-increasing line rates and the higher complexity of network applications. Micro-architectural techniques like superscalar, deep pipeline and speculative execution provide an excellent method of improving performance without limiting either the scalability or flexibility, provided that the branch penalty is well controlled. However, it is difficult for traditional branch predictor to keep increasing the accuracy by using larger tables, due to the fewer variations in branch patterns of packet processing. To improve the prediction efficiency, we propose a flow-based prediction mechanism which caches the branch histories of packets with similar header fields, since they normally undergo the same execution path. For packets that cannot find a matching entry in the history table, a fallback gshare predictor is used to provide branch direction. Simulation results show that the our scheme achieves an average hit rate in excess of 97.5% on a selected set of network applications and real-life packet traces, with a similar chip area to the existing branch prediction architectures used in modern microprocessors

    SpecLLM: Exploring Generation and Review of VLSI Design Specification with Large Language Model

    Full text link
    The development of architecture specifications is an initial and fundamental stage of the integrated circuit (IC) design process. Traditionally, architecture specifications are crafted by experienced chip architects, a process that is not only time-consuming but also error-prone. Mistakes in these specifications may significantly affect subsequent stages of chip design. Despite the presence of advanced electronic design automation (EDA) tools, effective solutions to these specification-related challenges remain scarce. Since writing architecture specifications is naturally a natural language processing (NLP) task, this paper pioneers the automation of architecture specification development with the advanced capabilities of large language models (LLMs). Leveraging our definition and dataset, we explore the application of LLMs in two key aspects of architecture specification development: (1) Generating architecture specifications, which includes both writing specifications from scratch and converting RTL code into detailed specifications. (2) Reviewing existing architecture specifications. We got promising results indicating that LLMs may revolutionize how these critical specification documents are developed in IC design nowadays. By reducing the effort required, LLMs open up new possibilities for efficiency and accuracy in this crucial aspect of chip design

    An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor

    Get PDF
    Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration

    Chameleon: a heterogeneous and disaggregated accelerator system for retrieval-augmented language models

    Full text link
    A Retrieval-Augmented Language Model (RALM) augments a generative language model by retrieving context-specific knowledge from an external database. This strategy facilitates impressive text generation quality even with smaller models, thus reducing orders of magnitude of computational demands. However, RALMs introduce unique system design challenges due to (a) the diverse workload characteristics between LM inference and retrieval and (b) the various system requirements and bottlenecks for different RALM configurations such as model sizes, database sizes, and retrieval frequencies. We propose Chameleon, a heterogeneous accelerator system that integrates both LM and retrieval accelerators in a disaggregated architecture. The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements retrieval accelerators on FPGAs and assigns LM inference to GPUs, with a CPU server orchestrating these accelerators over the network. Compared to CPU-based and CPU-GPU vector search systems, Chameleon achieves up to 23.72x speedup and 26.2x energy efficiency. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. These promising results pave the way for bringing accelerator heterogeneity and disaggregation into future RALM systems
    corecore