62,600 research outputs found
Combining dynamic and static scheduling in high-level synthesis
Field Programmable Gate Arrays (FPGAs) are starting to become mainstream devices for custom computing, particularly deployed in data centres. However, using these FPGA devices requires familiarity with digital design at a low abstraction level. In order to enable software engineers without a hardware background to design custom hardware, high-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description.
A central task in HLS is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile time, but recent years have seen the emergence of dynamic scheduling, in which an operation’s clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has a non-trivial control flow.
This thesis proposes a scheduling approach that combines the best of both worlds. My idea is to use existing program analysis techniques in software designs, such as probabilistic analysis and formal verification, to optimize the HLS hardware. First, this thesis proposes a tool named DASS that uses a heuristic-based approach to identify the code regions in the input program that are amenable to static scheduling and synthesises them into statically scheduled components, also known as static islands, leaving the top-level hardware dynamically scheduled. Second, this thesis addresses a problem of this approach: that the analysis of static islands and their dynamically scheduled surroundings are separate, where one treats the other as black boxes. We apply static analysis including dependence analysis between static islands and their dynamically scheduled surroundings to optimize the offsets of static islands for high performance. We also apply probabilistic analysis to estimate the performance of the dynamically scheduled part and use this information to optimize the static islands for high area efficiency. Finally, this thesis addresses the problem of conservatism in using sequential control flow designs which can limit the throughput of the hardware. We show this challenge can be solved by formally proving that certain control flows can be safely parallelised for high performance. This thesis demonstrates how to use automated formal verification to find out-of-order loop pipelining solutions and multi-threading solutions from a sequential program.Open Acces
Synthesis of Embedded Software using Dataflow Schedule Graphs
In the design and implementation of digital signal processing (DSP) systems,
dataflow is recognized as a natural model for specifying applications, and
dataflow enables useful model-based methodologies for analysis, synthesis, and
optimization of implementations. A wide range of embedded signal processing
applications can be designed efficiently using the high level abstractions that
are provided by dataflow programming models. In addition to their use in
parallelizing computations for faster execution, dataflow graphs have
additional advantages that stem from their modularity and formal foundation.
An important problem in the development of dataflow-based design tools is the
automated synthesis of software from dataflow representations.
In this thesis, we develop new software synthesis techniques for dataflow based
design and implementation of signal processing systems. An important task in
software synthesis from dataflow graphs is that of {\em scheduling}. Scheduling
refers to the assignment of actors to processing resources and the ordering of
actors that share the same resource. Scheduling typically involves very complex
design spaces, and has a significant impact on most relevant implementation
metrics, including latency, throughput, energy consumption, and memory
requirements. In this thesis, we integrate a model-based representation,
called the {\em dataflow schedule graph} ({\em DSG}), into the software
synthesis process. The DSG approach allows designers to model a schedule for a
dataflow graph as a separate dataflow graph, thereby providing a formal,
abstract (platform- and language-independent) representation for the schedule.
While we demonstrate this DSG-integrated software synthesis capability by
translating DSGs into OpenCL implementations, the use of a model-based schedule
representation makes the approach readily retargetable to other implementation
languages. We also investigate a number of optimization techniques to improve
the efficiency of software that is synthesized from DSGs.
Through experimental evaluation of the generated software, we demonstrate the
correctness and efficiency of our new techniques for dataflow-based
software synthesis and optimization
Using ACL2 to Verify Loop Pipelining in Behavioral Synthesis
Behavioral synthesis involves compiling an Electronic System-Level (ESL)
design into its Register-Transfer Level (RTL) implementation. Loop pipelining
is one of the most critical and complex transformations employed in behavioral
synthesis. Certifying the loop pipelining algorithm is challenging because
there is a huge semantic gap between the input sequential design and the output
pipelined implementation making it infeasible to verify their equivalence with
automated sequential equivalence checking techniques. We discuss our ongoing
effort using ACL2 to certify loop pipelining transformation. The completion of
the proof is work in progress. However, some of the insights developed so far
may already be of value to the ACL2 community. In particular, we discuss the
key invariant we formalized, which is very different from that used in most
pipeline proofs. We discuss the needs for this invariant, its formalization in
ACL2, and our envisioned proof using the invariant. We also discuss some
trade-offs, challenges, and insights developed in course of the project.Comment: In Proceedings ACL2 2014, arXiv:1406.123
High-level synthesis under I/O Timing and Memory constraints
The design of complex Systems-on-Chips implies to take into account
communication and memory access constraints for the integration of dedicated
hardware accelerator. In this paper, we present a methodology and a tool that
allow the High-Level Synthesis of DSP algorithm, under both I/O timing and
memory constraints. Based on formal models and a generic architecture, this
tool helps the designer to find a reasonable trade-off between both the
required I/O timing behavior and the internal memory access parallelism of the
circuit. The interest of our approach is demonstrated on the case study of a
FFT algorithm
A Multi-objective Perspective for Operator Scheduling using Fine-grained DVS Architecture
The stringent power budget of fine grained power managed digital integrated
circuits have driven chip designers to optimize power at the cost of area and
delay, which were the traditional cost criteria for circuit optimization. The
emerging scenario motivates us to revisit the classical operator scheduling
problem under the availability of DVFS enabled functional units that can
trade-off cycles with power. We study the design space defined due to this
trade-off and present a branch-and-bound(B/B) algorithm to explore this state
space and report the pareto-optimal front with respect to area and power. The
scheduling also aims at maximum resource sharing and is able to attain
sufficient area and power gains for complex benchmarks when timing constraints
are relaxed by sufficient amount. Experimental results show that the algorithm
that operates without any user constraint(area/power) is able to solve the
problem for most available benchmarks, and the use of power budget or area
budget constraints leads to significant performance gain.Comment: 18 pages, 6 figures, International journal of VLSI design &
Communication Systems (VLSICS
A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem
The Task Graph Cost-Optimal Scheduling Problem consists in scheduling a certain number of interdependent tasks onto a set of heterogeneous processors (characterized by idle and running rates per time unit), minimizing the cost of the entire process. This paper provides a novel formulation for this scheduling puzzle, in which an optimal solution is computed through a sequence of Binate Covering Problems, hinged within a Bounded Model Checking paradigm. In this approach, each covering instance, providing a min-cost trace for a given schedule depth, can be solved with several strategies, resorting to Minimum-Cost Satisfiability solvers or Pseudo-Boolean Optimization tools. Unfortunately, all direct resolution methods show very low efficiency and scalability. As a consequence, we introduce a specialized method to solve the same sequence of problems, based on a traditional all-solution SAT solver. This approach follows the "circuit cofactoring" strategy, as it exploits a powerful technique to capture a large set of solutions for any new SAT counter-example. The overall method is completed with a branch-and-bound heuristic which evaluates lower and upper bounds of the schedule length, to reduce the state space that has to be visited. Our results show that the proposed strategy significantly improves the blind binate covering schema, and it outperforms general purpose state-of-the-art tool
Formal and Informal Methods for Multi-Core Design Space Exploration
We propose a tool-supported methodology for design-space exploration for
embedded systems. It provides means to define high-level models of applications
and multi-processor architectures and evaluate the performance of different
deployment (mapping, scheduling) strategies while taking uncertainty into
account. We argue that this extension of the scope of formal verification is
important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156
A Design Methodology for Space-Time Adapter
This paper presents a solution to efficiently explore the design space of
communication adapters. In most digital signal processing (DSP) applications,
the overall architecture of the system is significantly affected by
communication architecture, so the designers need specifically optimized
adapters. By explicitly modeling these communications within an effective
graph-theoretic model and analysis framework, we automatically generate an
optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs
a C description of Input/Output data scheduling, and user requirements
(throughput, latency, parallelism...), and formalizes communication constraints
through a Resource Constraints Graph (RCG). The RCG properties enable an
efficient architecture space exploration in order to synthesize a STAR
component. The proposed approach has been tested to design an industrial data
mixing block example: an Ultra-Wideband interleaver.Comment: ISBN : 978-1-59593-606-
Optimizing construction of scheduled data flow graph for on-line testability
The objective of this work is to develop a new methodology for behavioural synthesis using a flow of synthesis, better suited to the scheduling of independent calculations and non-concurrent online testing. The traditional behavioural synthesis process can be defined as the compilation of an algorithmic specification into an architecture composed of a data path and a controller. This stream of synthesis generally involves scheduling, resource allocation, generation of the data path and controller synthesis. Experiments showed that optimization started at the high level synthesis improves the performance of the result, yet the current tools do not offer synthesis optimizations that from the RTL level. This justifies the development of an optimization methodology which takes effect from the behavioural specification and accompanying the synthesis process in its various stages. In this paper we propose the use of algebraic properties (commutativity, associativity and distributivity) to transform readable mathematical formulas of algorithmic specifications into mathematical formulas evaluated efficiently. This will effectively reduce the execution time of scheduling calculations and increase the possibilities of testability
- …