4,967 research outputs found

    The synthesis of application-specific machines using the Euler language

    Get PDF
    A rapid prototyping environment, called SAMUEL, for creating custom computing machines is described. The custom computing machines are synthesized by a compiler from a general purpose algorithmic language and a library of Verilog opcode circuits. The opcode circuits implement the interpretation rules defined for the algorithmic language. The compiler produces as output a Verilog description of the custom computing machine. This description can be used for simulation, or for synthesis with commercial tools;The opcode library makes SAMUEL unique among other research work that has been documented by raising the semantic level of the level 0 circuits. SAMUEL is also unique because the algorithmic language used is not a hardware description language, and it has not been modified in any way from the original language definition. Finally, SAMUEL is unique because the language chosen supports dynamic procedure definition. This allows a procedure to transform into a completely different procedure at runtime. This is language-supported reconfigurability which enhances the current research trends in reconfigurable devices;Custom computing machines generated by SAMUEL can be described using the scheme given by Milutinovic as software translated, language corresponding, complex, directly executing architectural support for the high-level language Euler (1). The approach differs from other work, however, by exploiting the field programmability of gate arrays (and the freedom guaranteed by a simulation environment) to create custom computing machines that only support the required language opcodes. This is important when the limited real-estate space of programmable logic is considered. Averaged real-estate savings can be achieved by not implementing support for the entire language on every custom computing machine

    Minimum entropy restoration using FPGAs and high-level techniques

    Get PDF
    One of the greatest perceived barriers to the widespread use of FPGAs in image processing is the difficulty for application specialists of developing algorithms on reconfigurable hardware. Minimum entropy deconvolution (MED) techniques have been shown to be effective in the restoration of star-field images. This paper reports on an attempt to implement a MED algorithm using simulated annealing, first on a microprocessor, then on an FPGA. The FPGA implementation uses DIME-C, a C-to-gates compiler, coupled with a low-level core library to simplify the design task. Analysis of the C code and output from the DIME-C compiler guided the code optimisation. The paper reports on the design effort that this entailed and the resultant performance improvements

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Pipeline synthesis and optimization for reconfigurable custom computing machines

    Get PDF
    This paper presents a pipeline synthesis and optimization technique for high-level language programming of reconfigurable Custom Computing Machines. The circuit synthesis generates hardware accelerators from a sequential program which exploit the reconfigurable hardware\u27s parallelism. Program loops are transformed to structural hardware specifications. The optimization algorithm uses integer linear programming to balance and pipeline the circuit\u27s registers. This global optimization determines the minimal amount of flip-flops necessary for an optimal pipeline throughput. It also considers the irregular flip-flop distribution on FPGAs. Standard interface circuitry and a runtime system provide the connection between the accelerator unit and its host computer. An integrated compiler invokes the synthesis and produces a program which downloads, calls and controls its hardware accelerators automatically

    P4CEP: Towards In-Network Complex Event Processing

    Full text link
    In-network computing using programmable networking hardware is a strong trend in networking that promises to reduce latency and consumption of server resources through offloading to network elements (programmable switches and smart NICs). In particular, the data plane programming language P4 together with powerful P4 networking hardware has spawned projects offloading services into the network, e.g., consensus services or caching services. In this paper, we present a novel case for in-network computing, namely, Complex Event Processing (CEP). CEP processes streams of basic events, e.g., stemming from networked sensors, into meaningful complex events. Traditionally, CEP processing has been performed on servers or overlay networks. However, we argue in this paper that CEP is a good candidate for in-network computing along the communication path avoiding detouring streams to distant servers to minimize communication latency while also exploiting processing capabilities of novel networking hardware. We show that it is feasible to express CEP operations in P4 and also present a tool to compile CEP operations, formulated in our P4CEP rule specification language, to P4 code. Moreover, we identify challenges and problems that we have encountered to show future research directions for implementing full-fledged in-network CEP systems.Comment: 6 pages. Author's versio
    corecore