1,718 research outputs found

    Using ACL2 to Verify Loop Pipelining in Behavioral Synthesis

    Get PDF
    Behavioral synthesis involves compiling an Electronic System-Level (ESL) design into its Register-Transfer Level (RTL) implementation. Loop pipelining is one of the most critical and complex transformations employed in behavioral synthesis. Certifying the loop pipelining algorithm is challenging because there is a huge semantic gap between the input sequential design and the output pipelined implementation making it infeasible to verify their equivalence with automated sequential equivalence checking techniques. We discuss our ongoing effort using ACL2 to certify loop pipelining transformation. The completion of the proof is work in progress. However, some of the insights developed so far may already be of value to the ACL2 community. In particular, we discuss the key invariant we formalized, which is very different from that used in most pipeline proofs. We discuss the needs for this invariant, its formalization in ACL2, and our envisioned proof using the invariant. We also discuss some trade-offs, challenges, and insights developed in course of the project.Comment: In Proceedings ACL2 2014, arXiv:1406.123

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    On-Chip Transparent Wire Pipelining (invited paper)

    Get PDF
    Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects

    Modulo scheduling with integrated register spilling for clustered VLIW architectures

    Get PDF
    Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around. In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.Peer ReviewedPostprint (published version

    Combining dynamic and static scheduling in high-level synthesis

    Get PDF
    Field Programmable Gate Arrays (FPGAs) are starting to become mainstream devices for custom computing, particularly deployed in data centres. However, using these FPGA devices requires familiarity with digital design at a low abstraction level. In order to enable software engineers without a hardware background to design custom hardware, high-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A central task in HLS is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile time, but recent years have seen the emergence of dynamic scheduling, in which an operation’s clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has a non-trivial control flow. This thesis proposes a scheduling approach that combines the best of both worlds. My idea is to use existing program analysis techniques in software designs, such as probabilistic analysis and formal verification, to optimize the HLS hardware. First, this thesis proposes a tool named DASS that uses a heuristic-based approach to identify the code regions in the input program that are amenable to static scheduling and synthesises them into statically scheduled components, also known as static islands, leaving the top-level hardware dynamically scheduled. Second, this thesis addresses a problem of this approach: that the analysis of static islands and their dynamically scheduled surroundings are separate, where one treats the other as black boxes. We apply static analysis including dependence analysis between static islands and their dynamically scheduled surroundings to optimize the offsets of static islands for high performance. We also apply probabilistic analysis to estimate the performance of the dynamically scheduled part and use this information to optimize the static islands for high area efficiency. Finally, this thesis addresses the problem of conservatism in using sequential control flow designs which can limit the throughput of the hardware. We show this challenge can be solved by formally proving that certain control flows can be safely parallelised for high performance. This thesis demonstrates how to use automated formal verification to find out-of-order loop pipelining solutions and multi-threading solutions from a sequential program.Open Acces

    Indexed Labels for Loop Iteration Dependent Costs

    Get PDF
    We present an extension to the labelling approach, a technique for lifting resource consumption information from compiled to source code. This approach, which is at the core of the annotating compiler from a large fragment of C to 8051 assembly of the CerCo project, looses preciseness when differences arise as to the cost of the same portion of code, whether due to code transformation such as loop optimisations or advanced architecture features (e.g. cache). We propose to address this weakness by formally indexing cost labels with the iterations of the containing loops they occur in. These indexes can be transformed during the compilation, and when lifted back to source code they produce dependent costs. The proposed changes have been implemented in CerCo's untrusted prototype compiler from a large fragment of C to 8051 assembly.Comment: In Proceedings QAPL 2013, arXiv:1306.241

    Enhancing Predicate Pairing with Abstraction for Relational Verification

    Full text link
    Relational verification is a technique that aims at proving properties that relate two different program fragments, or two different program runs. It has been shown that constrained Horn clauses (CHCs) can effectively be used for relational verification by applying a CHC transformation, called predicate pairing, which allows the CHC solver to infer relations among arguments of different predicates. In this paper we study how the effects of the predicate pairing transformation can be enhanced by using various abstract domains based on linear arithmetic (i.e., the domain of convex polyhedra and some of its subdomains) during the transformation. After presenting an algorithm for predicate pairing with abstraction, we report on the experiments we have performed on over a hundred relational verification problems by using various abstract domains. The experiments have been performed by using the VeriMAP transformation and verification system, together with the Parma Polyhedra Library (PPL) and the Z3 solver for CHCs.Comment: Pre-proceedings paper presented at the 27th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur, Belgium, 10-12 October 2017 (arXiv:1708.07854

    FeladatfĂŒggƑ felĂ©pĂ­tĂ©sƱ többprocesszoros cĂ©lrendszerek szintĂ©zis algoritmusainak kutatĂĄsa = Research of synthesis algorithms for special-purpose multiprocessing systems with task-dependent architecture

    Get PDF
    Új mĂłdszert Ă©s egy keretrendszert fejlesztettĂŒnk ki olyan speciĂĄlis többprocesszoros struktĂșra tervezĂ©sĂ©re, amely lehetƑvĂ© teszi a pipeline mƱködtetĂ©st akkor is, ha a feladat-leĂ­rĂĄsban nincs hatĂ©konyan kihasznĂĄlhatĂł pĂĄrhuzamossĂĄg. A szintĂ©zis egy magas szintƱ nyelven (C, Java, stb.) adott feladatleĂ­rĂĄsbĂłl indul ki. EzutĂĄn dekompozĂ­ciĂłs algoritmus megfelelƑ szegmenseket kĂ©pez a program alapjĂĄn. A szegmensek kĂ­vĂĄnt szĂĄma, a szegmenseket megvalĂłsĂ­tĂł processzorok fƑbb tulajdonsĂĄgai Ă©s a becsĂŒlt kommunikĂĄciĂłs idƑigĂ©nyek megadhatĂłk bemeneti paramĂ©terekkĂ©nt. KedvezƑ pipeline felĂ©pĂ­tĂ©s cĂ©ljĂĄbĂłl a pipeline adatfolyamok magas szintƱ szintĂ©zisĂ©nek (HLS) mĂłdszertanĂĄt alkalmaztuk. Ezek az eszközök az ĂŒtemezĂ©s Ă©s az allokĂĄciĂł rĂ©vĂ©n kĂ­sĂ©rlik meg az optimalizĂĄlĂĄst a szegmensekbƑl kĂ©pzett adatfolyam grĂĄfon. EzĂ©rt a kiadĂłdĂł többprocesszoros felĂ©pĂ­tĂ©s nem egy uniformizĂĄlt processzor-rĂĄcs, hanem a megoldandĂł feladatra formĂĄlt struktĂșra, Ă­gy feladatfĂŒggƑnek nevezhetƑ. A mĂłdszer modularitĂĄsa lehetƑvĂ© teszi a dekompozĂ­ciĂłs algoritmusnak Ă©s a HLS eszköznek a cserĂ©jĂ©t, mĂłdosĂ­tĂĄsĂĄt az alkalmazĂĄsi igĂ©nyektƑl fĂŒggƑen. A mĂłdszer kiĂ©rtĂ©kelĂ©se cĂ©ljĂĄbĂłl olyan HLS eszközt alkalmaztunk, amely a kĂ­vĂĄnt pipeline ĂșjraindĂ­tĂĄsi periĂłdust bemeneti adatkĂ©nt tudja kezelni, Ă©s processzorok között egy optimalizĂĄlt idƑosztĂĄsos, arbitrĂĄciĂł-mentes sĂ­nrendszert hoz lĂ©tre. Ebben a struktĂșrĂĄban a kommunikĂĄciĂł szervezĂ©sĂ©hez nincs szĂŒksĂ©g kĂŒlön szoftver tĂĄmogatĂĄsra, ha a processzorok kĂ©pesek közvetlen adatforgalomra. | A new method and a framework tool has been developed for designing a special multiprocessing structure for making the pipeline function possible as a special parallel processing, even if there is no efficiently exploitable parallelism in the task description. The synthesis starts from a task description written in a high level language (C, Java, etc). A decomposing algorithm generates proper segments of this program. The desired number of the segments, the main properties of the processor set implementing the segments and the estimated communication time-demand can be given as input parameters. For constructing a pipeline structure, the high-level synthesis (HLS) methodology of pipelined datapaths is applied. These tools attempt to optimize by scheduling and allocating the dataflow graph generated from the segments Thus, the resulted structure is not a uniform processor grid, but it is shaped depending on the task, i.e. it can be called task-dependent. The modularity of the method permits the decomposition algorithm and the HLS tool to be replaced by other ones depending on the requirements of the application. For evaluating the method, a specific HLS tool is applied, which can accept the desired pipeline restart time as input parameter, and generates an optimized time shared simple arbitration-free bus system between the processing units. Therefore, there is no need for extra efforts to organize the communication, if the processing units can transfer data directly

    Loop splitting for efficient pipelining in high-level synthesis

    Get PDF
    Loop pipelining is widely adopted as a key optimization method in high-level synthesis (HLS). However, when complex memory dependencies appear in a loop, commercial HLS tools are still not able to maximize pipeline performance. In this paper, we leverage parametric polyhedral analysis to reason about memory dependence patterns that are uncertain (i.e., parameterised by an undetermined variable) and/or nonuniform (i.e., varying between loop iterations). We develop an automated source-to-source code transformation to split the loop into pieces, which are then synthesised by Vivado HLS as the hardware generation back-end. Our technique allows generated loops to run with a minimal interval, automatically inserting statically-determined parametric pipeline breaks at those iterations violating dependencies. Our experiments on seven representative benchmarks show that, compared to default loop pipelining, our parametric loop splitting improves pipeline performance by 4:3 in terms of clock cycles per iteration. The optimized pipelines consume 2:0 as many LUTs, 1:8 as many registers, and 1:1 as many DSP blocks. Hence the area-time product is improved by nearly a factor of 2
    • 

    corecore