883 research outputs found

    Autotuning the Intel HLS Compiler using the Opentuner Framework

    Get PDF
    High level synthesis (HLS) tools can be used to improve design flow and decrease verification times for field programmable gate array (FPGA) and application specific integrated circuit (ASIC) design. The Intel HLS Compiler is a high level synthesis tool that takes in untimed C/C++ as input and generates production-quality register transfer level (RTL) code that is optimized for Intel FPGAs. The translation does, however, require multiple iterations and manual optimizations to get comparable synthesized results to that of a solution written in a hardware descriptive language. The synthesis results can vary greatly based upon coding style and optimization techniques, and typically require an in-depth knowledge of FPGAs to fully optimize the translation which limits the audience of the tool. The extra abstraction that the C/C++ source code presents can also make it difficult to meet more specific design requirements; this includes designs to meet specific resource usage or performance based metrics. To improve the quality of results generated by the Intel HLS Compiler without a manual iterative process that requires an in-depth knowledge of FPGAs, this research proposes a method of automating some of the optimization techniques that improve the synthesized design through an autotuning process. The proposed approach utilizes the PyCParser library to parse C source files and the OpenTuner Framework to autotune the synthesis to provide a method that generates results that better meet the needs of the designer's requirements through lower FPGA resource usage or increased design performance. Such functionality is not currently available in Intel's commercial tools. The proposed approach was tested with the CHStone Benchmarking Suite of C programs as well as a standard digital signal processing finite impulse response filter. The results show that the commercial HLS tool can be automatically autotuned through placeholder injection using a source parsing tool for C code and using the OpenTuner Framework to autotune the results. For designs that are small in nature and include conducive structures to be autotuned, the results indicate resource usage reductions and/or performance increases of up to 40% as compared to the default Intel HLS Compiler results. The method developed in this research also allows additional design targets to be specified through the autotuner for consideration in the synthesized design which can yield results that are better matched to a design's requirements

    Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping

    Full text link
    The multi-pumping resource sharing technique can overcome the limitations commonly found in single-clocked FPGA designs by allowing hardware components to operate at a higher clock frequency than the surrounding system. However, this optimization cannot be expressed in high levels of abstraction, such as HLS, requiring the use of hand-optimized RTL. In this paper we show how to leverage multiple clock domains for computational subdomains on reconfigurable devices through data movement analysis on high-level programs. We offer a novel view on multi-pumping as a compiler optimization - a superclass of traditional vectorization. As multiple data elements are fed and consumed, the computations are packed temporally rather than spatially. The optimization is applied automatically using an intermediate representation that maps high-level code to HLS. Internally, the optimization injects modules into the generated designs, incorporating RTL for fine-grained control over the clock domains. We obtain a reduction of resource consumption by up to 50% on critical components and 23% on average. For scalable designs, this can enable further parallelism, increasing overall performance

    The design of aircraft using the decision support problem technique

    Get PDF
    The Decision Support Problem Technique for unified design, manufacturing and maintenance is being developed at the Systems Design Laboratory at the University of Houston. This involves the development of a domain-independent method (and the associated software) that can be used to process domain-dependent information and thereby provide support for human judgment. In a computer assisted environment, this support is provided in the form of optimal solutions to Decision Support Problems

    A Compilation Flow for Parametric Dataflow: Programming Model, Scheduling, and Application to Heterogeneous MPSoC

    Get PDF
    International audienceEfficient programming of signal processing applications on embedded systems is a complex problem. High level models such as Synchronous dataflow (SDF) have been privileged candidates for dealing with this complexity. These models permit to express inherent application parallelism, as well as analysis for both verification and optimization. Parametric dataflow models aim at providing sufficient dynamicity to model new applications, while at the same time maintaining the high level of analyzability needed for efficient real life implementations. This paper presents a new compilation flow that targets parametric dataflows. Built on the LLVM compiler infrastructure, it offers an actor based C++ programming model to describe parametric graphs, a compilation front-end providing graph analysis features, and a retargetable back-end to map the application on real hardware. This paper gives an overview of this flow, with a specific focus on scheduling. The crucial gap between dataflow models and real hardware on which actor firing is not atomic, as well as the consequences on FIFOs sizing and execution pipelining are taken into account.The experimental results illustrate our compilation flow applied to compilation of 3GPP LTE-Advanced demodulation on a heterogeneous MPSoC with distributed scheduling features. This achieves performances similar to time-consuming hand made optimizations

    Decoupling algorithms from schedules for easy optimization of image processing pipelines

    Get PDF
    Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism. We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code. We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.National Science Foundation (U.S.) (Grant 0964004)National Science Foundation (U.S.) (Grant 0964218)National Science Foundation (U.S.) (Grant 0832997)United States. Dept. of Energy (Award DE-SC0005288)Cognex CorporationAdobe System

    System Energy-Efficient Hybrid Beamforming for mmWave Multi-user Systems

    Get PDF
    This paper develops energy-efficient hybrid beamforming designs for mmWave multi-user systems where analog precoding is realized by switches and phase shifters such that radio frequency (RF) chain to transmit antenna connections can be switched off for energy saving. By explicitly considering the effect of each connection on the required power for baseband and RF signal processing, we describe the total power consumption in a sparsity form of the analog precoding matrix. However, these sparsity terms and sparsity-modulus constraints of the analog precoding make the system energy-efficiency maximization problem non-convex and challenging to solve. To tackle this problem, we first transform it into a subtractive-form weighted sum rate and power problem. A compressed sensing-based re-weighted quadratic-form relaxation method is employed to deal with the sparsity parts and the sparsity-modulus constraints. We then exploit alternating minimization of the mean-squared error to solve the equivalent problem where the digital precoding vectors and the analog precoding matrix are updated sequentially. The energy efficiency upper bound and a heuristic algorithm are also examined for comparison purposes. Numerical results confirm the superior performances of the proposed algorithm over benchmark energy-efficiency hybrid precoding algorithms and heuristic ones.Comment: submitted to TGC

    Stochastic Optimization of Bioreactor Control Policies Using a Markov Decision Process Model

    Get PDF
    Biopharmaceuticals are the fastest-growing segment of the pharmaceutical industry. Their manufacture is complicated by the uncertainty exhibited therein. Scholars have studied the planning and operation of such production systems under some uncertainties, but the simultaneous consideration of fermentation and resin yield uncertainty is lacking so-far. To study the optimal operation of biopharmaceutical production and purification systems under these uncertainties, a stochastic, dynamic approach is necessary. This thesis provides such a model by extending an existing discrete state-space, infinite horizon Markov decision process model of upstream fermentation. Tissue Plasminogen Activator fermentation and chromatography was implemented. This example was used to discuss the optimal policy for operating different fermentation setups. The average per-cycle operating profit of a serial setup was 1,272 $; the parallel setup produced negative average rewards. Managerial insights were derived from a comparison to a basic, titer maximizing policy and process sensitivities. In conclusion, the integrated stochastic optimization of biopharma production and purification control aids decision making. However, the model assumptions pose room for further studies. Keywords: Markov decision process; biopharmaceuticals production; fermentation uncertainty; chromatography resin; stochastic performance decay
    corecore