260 research outputs found

    High-Level Synthesis for Embedded Systems

    Get PDF

    High-level synthesis of fine-grained weakly consistent C concurrency

    Get PDF
    High-level synthesis (HLS) is the process of automatically compiling high-level programs into a netlist (collection of gates). Given an input program, HLS tools exploit its inherent parallelism and pipelining opportunities to generate efficient customised hardware. C-based programs are the most popular input for HLS tools, but these tools historically only synthesise sequential C programs. As the appeal for software concurrency rises, HLS tools are beginning to synthesise concurrent C programs, such as C/C++ pthreads and OpenCL. Although supporting software concurrency leads to better hardware parallelism, shared memory synchronisation is typically serialised to ensure correct memory behaviour, via locks. Locks are safety resources that ensure exclusive access of shared memory, eliminating data races and providing synchronisation guarantees for programmers.  As an alternative to lock-based synchronisation, the C memory model also defines the possibility of lock-free synchronisation via fine-grained atomic operations (`atomics'). However, most HLS tools either do not support atomics at all or implement atomics using locks. Instead, we treat the synthesis of atomics as a scheduling problem. We show that we can augment the intra-thread memory constraints during memory scheduling of concurrent programs to support atomics. On average, hardware generated by our method is 7.5x faster than the state-of-the-art, for our set of experiments. Our method of synthesising atomics enables several unique possibilities. Chiefly, we are capable of supporting weakly consistent (`weak') atomics, which necessitate fewer ordering constraints compared to sequentially consistent (SC) atomics. However, implementing weak atomics is complex and error-prone and hence we formally verify our methods via automated model checking to ensure our generated hardware is correct. Furthermore, since the C memory model defines memory behaviour globally, we can globally analyse the entire program to generate its memory constraints. Additionally, we can also support loop pipelining by extending our methods to generate inter-iteration memory constraints. On average, weak atomics, global analysis and loop pipelining improve performance by 1.6x, 3.4x and 1.4x respectively, for our set of experiments. Finally, we present a case study of a real-world example via an HLS-based Google PageRank algorithm, whose performance improves by 4.4x via lock-free streaming and work-stealing.Open Acces

    A performance driven approach for hardware synthesis of guarded atomic actions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 137-140).Hardware designers are facing new challenges in the design of complex ASIC's and processors as their sizes approach up to 100 million logic gates. We believe no adequate solution exists that allows designers to specify hardware which takes full advantage of the available resources in these devices. The hardware design specification languages are either too low level to support efficient large scale design (for example, Verilog), or the language and synthesis methodology is so high-level that the designer's micro-architectural ingenuity is lost in the design process. This results in circuits that oftentimes do not match the designer's expectations (for example, C-based behavioral synthesis). 'This thesis presents a design methodology and related synthesis algorithms that address several of the key issues of hardware design specification and high-level synthesis while avoiding the pitfalls of past approaches. The areas we focus on are modular compilation and performance specification. The modular flow allows for the separate compilation of modules and ensures the correct usage of module interfaces by attaching annotations with well defined semantics to them. We also introduce performance specifications as a core part of a design description.(cont.) This allows a designer to more easily achieve the expected design performance and it allows for rapid micro-architectural exploration. We chose guarded atomic actions as the foundation of this research because of their clean execution semantics. These semantics allow for easy design transformation (either manual or compiler driven) while ensuring that the correctness of the design is maintained. We demonstrate the practicality and power of this methodology using several examples, such as a processor which from a single design description can automatically be transformed into an unpipelined processor or a superscalar processor simply by changing a single-line performance specification.by Daniel L. Rosenband.Ph.D

    Optimizing Dataflow Programs for Hardware Synthesis

    Get PDF
    • …
    corecore