Search CORE

1,718 research outputs found

Using ACL2 to Verify Loop Pipelining in Behavioral Synthesis

Author: Hao Kecheng
Puri Disha
Ray Sandip
Xie Fei
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2014
Field of study

Behavioral synthesis involves compiling an Electronic System-Level (ESL) design into its Register-Transfer Level (RTL) implementation. Loop pipelining is one of the most critical and complex transformations employed in behavioral synthesis. Certifying the loop pipelining algorithm is challenging because there is a huge semantic gap between the input sequential design and the output pipelined implementation making it infeasible to verify their equivalence with automated sequential equivalence checking techniques. We discuss our ongoing effort using ACL2 to certify loop pipelining transformation. The completion of the proof is work in progress. However, some of the insights developed so far may already be of value to the ACL2 community. In particular, we discuss the key invariant we formalized, which is very different from that used in most pipeline proofs. We discuss the needs for this invariant, its formalization in ACL2, and our envisioned proof using the invariant. We also discuss some trade-offs, challenges, and insights developed in course of the project.Comment: In Proceedings ACL2 2014, arXiv:1406.123

arXiv.org e-Print Archive

Directory of Open Access Journals

PDXScholar (Portland State University)

Survey on Combinatorial Register Allocation and Instruction Scheduling

Author: Lozano Roberto Castañeda
Schulte Christian
Publication venue
Publication date: 01/01/2018
Field of study

Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

On-Chip Transparent Wire Pipelining (invited paper)

Author: Casu Mario Roberto
Macchiarulo Luca
Publication venue: IEEE Computer Society
Publication date: 01/01/2004
Field of study

Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Modulo scheduling with integrated register spilling for clustered VLIW architectures

Author: Ayguadé Parra Eduard
Llosa Espuny José Francisco
Valero Cortés Mateo
Zalamea León Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around. In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Combining dynamic and static scheduling in high-level synthesis

Author: Cheng Jianyi
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/07/2023
Field of study

Field Programmable Gate Arrays (FPGAs) are starting to become mainstream devices for custom computing, particularly deployed in data centres. However, using these FPGA devices requires familiarity with digital design at a low abstraction level. In order to enable software engineers without a hardware background to design custom hardware, high-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A central task in HLS is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile time, but recent years have seen the emergence of dynamic scheduling, in which an operation’s clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has a non-trivial control flow. This thesis proposes a scheduling approach that combines the best of both worlds. My idea is to use existing program analysis techniques in software designs, such as probabilistic analysis and formal verification, to optimize the HLS hardware. First, this thesis proposes a tool named DASS that uses a heuristic-based approach to identify the code regions in the input program that are amenable to static scheduling and synthesises them into statically scheduled components, also known as static islands, leaving the top-level hardware dynamically scheduled. Second, this thesis addresses a problem of this approach: that the analysis of static islands and their dynamically scheduled surroundings are separate, where one treats the other as black boxes. We apply static analysis including dependence analysis between static islands and their dynamically scheduled surroundings to optimize the offsets of static islands for high performance. We also apply probabilistic analysis to estimate the performance of the dynamically scheduled part and use this information to optimize the static islands for high area efficiency. Finally, this thesis addresses the problem of conservatism in using sequential control flow designs which can limit the throughput of the hardware. We show this challenge can be solved by formally proving that certain control flows can be safely parallelised for high performance. This thesis demonstrates how to use automated formal verification to find out-of-order loop pipelining solutions and multi-threading solutions from a sequential program.Open Acces

Spiral - Imperial College Digital Repository

Indexed Labels for Loop Iteration Dependent Costs

Author: Tranquilli Paolo
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2013
Field of study

We present an extension to the labelling approach, a technique for lifting resource consumption information from compiled to source code. This approach, which is at the core of the annotating compiler from a large fragment of C to 8051 assembly of the CerCo project, looses preciseness when differences arise as to the cost of the same portion of code, whether due to code transformation such as loop optimisations or advanced architecture features (e.g. cache). We propose to address this weakness by formally indexing cost labels with the iterations of the containing loops they occur in. These indexes can be transformed during the compilation, and when lifted back to source code they produce dependent costs. The proposed changes have been implemented in CerCo's untrusted prototype compiler from a large fragment of C to 8051 assembly.Comment: In Proceedings QAPL 2013, arXiv:1306.241

arXiv.org e-Print Archive

Directory of Open Access Journals

Open Access Repository

Enhancing Predicate Pairing with Abstraction for Relational Verification

Author: De Angelis Emanuele
Fioravanti Fabio
Pettorossi Alberto
Proietti Maurizio
Publication venue
Publication date: 01/01/2017
Field of study

Relational verification is a technique that aims at proving properties that relate two different program fragments, or two different program runs. It has been shown that constrained Horn clauses (CHCs) can effectively be used for relational verification by applying a CHC transformation, called predicate pairing, which allows the CHC solver to infer relations among arguments of different predicates. In this paper we study how the effects of the predicate pairing transformation can be enhanced by using various abstract domains based on linear arithmetic (i.e., the domain of convex polyhedra and some of its subdomains) during the transformation. After presenting an algorithm for predicate pairing with abstraction, we report on the experiments we have performed on over a hundred relational verification problems by using various abstract domains. The experiments have been performed by using the VeriMAP transformation and verification system, together with the Parma Polyhedra Library (PPL) and the Z3 solver for CHCs.Comment: Pre-proceedings paper presented at the 27th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur, Belgium, 10-12 October 2017 (arXiv:1708.07854

arXiv.org e-Print Archive

ART

Feladatfüggő felépítésű többprocesszoros célrendszerek szintézis algoritmusainak kutatása = Research of synthesis algorithms for special-purpose multiprocessing systems with task-dependent architecture

Author: Arató Péter
Loványi István
Móczár Géza
Pilászy György
Publication venue: OTKA
Publication date: 01/01/2013
Field of study

Új módszert és egy keretrendszert fejlesztettünk ki olyan speciális többprocesszoros struktúra tervezésére, amely lehetővé teszi a pipeline működtetést akkor is, ha a feladat-leírásban nincs hatékonyan kihasználható párhuzamosság. A szintézis egy magas szintű nyelven (C, Java, stb.) adott feladatleírásból indul ki. Ezután dekompozíciós algoritmus megfelelő szegmenseket képez a program alapján. A szegmensek kívánt száma, a szegmenseket megvalósító processzorok főbb tulajdonságai és a becsült kommunikációs időigények megadhatók bemeneti paraméterekként. Kedvező pipeline felépítés céljából a pipeline adatfolyamok magas szintű szintézisének (HLS) módszertanát alkalmaztuk. Ezek az eszközök az ütemezés és az allokáció révén kísérlik meg az optimalizálást a szegmensekből képzett adatfolyam gráfon. Ezért a kiadódó többprocesszoros felépítés nem egy uniformizált processzor-rács, hanem a megoldandó feladatra formált struktúra, így feladatfüggőnek nevezhető. A módszer modularitása lehetővé teszi a dekompozíciós algoritmusnak és a HLS eszköznek a cseréjét, módosítását az alkalmazási igényektől függően. A módszer kiértékelése céljából olyan HLS eszközt alkalmaztunk, amely a kívánt pipeline újraindítási periódust bemeneti adatként tudja kezelni, és processzorok között egy optimalizált időosztásos, arbitráció-mentes sínrendszert hoz létre. Ebben a struktúrában a kommunikáció szervezéséhez nincs szükség külön szoftver támogatásra, ha a processzorok képesek közvetlen adatforgalomra. | A new method and a framework tool has been developed for designing a special multiprocessing structure for making the pipeline function possible as a special parallel processing, even if there is no efficiently exploitable parallelism in the task description. The synthesis starts from a task description written in a high level language (C, Java, etc). A decomposing algorithm generates proper segments of this program. The desired number of the segments, the main properties of the processor set implementing the segments and the estimated communication time-demand can be given as input parameters. For constructing a pipeline structure, the high-level synthesis (HLS) methodology of pipelined datapaths is applied. These tools attempt to optimize by scheduling and allocating the dataflow graph generated from the segments Thus, the resulted structure is not a uniform processor grid, but it is shaped depending on the task, i.e. it can be called task-dependent. The modularity of the method permits the decomposition algorithm and the HLS tool to be replaced by other ones depending on the requirements of the application. For evaluating the method, a specific HLS tool is applied, which can accept the desired pipeline restart time as input parameter, and generates an optimized time shared simple arbitration-free bus system between the processing units. Therefore, there is no need for extra efforts to organize the communication, if the processing units can transfer data directly

Repository of the Academy's Library

Loop splitting for efficient pipelining in high-level synthesis

Author: Constantinides GA
Liu J
Wickerson J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2016
Field of study

Loop pipelining is widely adopted as a key optimization method in high-level synthesis (HLS). However, when complex memory dependencies appear in a loop, commercial HLS tools are still not able to maximize pipeline performance. In this paper, we leverage parametric polyhedral analysis to reason about memory dependence patterns that are uncertain (i.e., parameterised by an undetermined variable) and/or nonuniform (i.e., varying between loop iterations). We develop an automated source-to-source code transformation to split the loop into pieces, which are then synthesised by Vivado HLS as the hardware generation back-end. Our technique allows generated loops to run with a minimal interval, automatically inserting statically-determined parametric pipeline breaks at those iterations violating dependencies. Our experiments on seven representative benchmarks show that, compared to default loop pipelining, our parametric loop splitting improves pipeline performance by 4:3 in terms of clock cycles per iteration. The optimized pipelines consume 2:0 as many LUTs, 1:8 as many registers, and 1:1 as many DSP blocks. Hence the area-time product is improved by nearly a factor of 2

Spiral - Imperial College Digital Repository