273 research outputs found

    High performance computing with FPGAs

    Get PDF
    Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

    A proposition of a manufactronic network approach for intelligent and flexible manufacturing systems

    Get PDF
    The XPRESS project introduces a completely new scalable concept of a manufactronic networked factory, which is composed by a co-ordinated team of specialized autonomous objects (Manufactrons), each knowing how to do a certain process optimally. This knowledge based concept integrated the complete chain: production configuration (decrease of ramp-up time of at least 50%), multi-variant production line (varying types and volumes on a single line) and 100% quality monitoring. The manufactronic networked architecture allows continuous process improvement, and will be able to anticipate and to respond to rapidly changing consumer needs, producing high-quality products in adequate quantities while reducing costs. This concept is demonstrated in the automotive, aeronautics and electrical industry but can be transferred to nearly all production processes

    Second year technical report on-board processing for future satellite communications systems

    Get PDF
    Advanced baseband and microwave switching techniques for large domestic communications satellites operating in the 30/20 GHz frequency bands are discussed. The nominal baseband processor throughput is one million packets per second (1.6 Gb/s) from one thousand T1 carrier rate customer premises terminals. A frequency reuse factor of sixteen is assumed by using 16 spot antenna beams with the same 100 MHz bandwidth per beam and a modulation with a one b/s per Hz bandwidth efficiency. Eight of the beams are fixed on major metropolitan areas and eight are scanning beams which periodically cover the remainder of the U.S. under dynamic control. User signals are regenerated (demodulated/remodulated) and message packages are reformatted on board. Frequency division multiple access and time division multiplex are employed on the uplinks and downlinks, respectively, for terminals within the coverage area and dwell interval of a scanning beam. Link establishment and packet routing protocols are defined. Also described is a detailed design of a separate 100 x 100 microwave switch capable of handling nonregenerated signals occupying the remaining 2.4 GHz bandwidth with 60 dB of isolation, at an estimated weight and power consumption of approximately 400 kg and 100 W, respectively

    Interconnect-aware scheduling and resource allocation for high-level synthesis

    Get PDF
    A high-level architectural synthesis can be described as the process of transforming a behavioral description into a structural description. The scheduling, processor allocation, and register binding are the most important tasks in the high-level synthesis. In the past, it has been possible to focus simply on the delays of the processing units in a high-level synthesis and neglect the wire delays, since the overall delay of a digital system was dominated by the delay of the logic gates. However, with the process technology being scaled down to deep-submicron region, the global interconnect delays can no longer be neglected in VLSI designs. It is, therefore, imperative to include in high-level synthesis the delays on wires and buses used to communicate data between the processing units i.e., inter-processor communication delays. Furthermore, the way the process of register binding is performed also has an impact on the complexity of the interconnect paths required to transfer data between the processing units. Hence, the register binding can no longer ignore its effect on the wiring complexity of resulting designs. The objective of this thesis is to develop techniques for an interconnect-aware high-level synthesis. Under this common theme, this thesis has two distinct focuses. The first focus of this thesis is on developing a new high-level synthesis framework while taking the inter-processor communication delay into consideration. The second focus of this thesis is on the developing of a technique to carry out the register binding and a scheme to reduce the number of registers while taking the complexity of the interconnects into consideration. A novel scheduling and processor allocation technique taking into consideration the inter-processor communication delay is presented. In the proposed technique, the communication delay between a pair of nodes of different types is treated as a non-computing node, whereas that between a pair of nodes of the same type is taken into account by re-adjusting the firing times of the appropriate nodes of the data flow graph (DFG). Another technique for the integration of the placement process into the scheduling and processor allocation in order to determine the actual positions of the processing units in the placement space is developed. The proposed technique makes use of a hybrid library of functional units, which includes both operation-specific and reconfigurable multiple-operation functional units, to maximize the local data transfer. A technique for register binding that results in a reduced number of registers and interconnects is developed by appropriately dividing the lifetime of a token into multiple segments and then binding those having the same source and/or destination into a single register. A node regeneration scheme, in which the idle processing units are utilized to generate multiple copies of the nodes in a given DFG, is devised to reduce the number of registers and interconnects even further. The techniques and schemes developed in this thesis are applied to the synthesis of architectures for a number of benchmark DSP problems and compared with various other commonly used synthesis methods in order to assess their effectiveness. It is shown that the proposed techniques provide superior performance in terms of the iteration period, placement area, and the numbers of the processing units, registers and interconnects in the synthesized architectur

    A Study of Dynamic Optimization Techniques: Lessons and Directions in Kernel Design

    Get PDF
    The Synthesis kernel [21,22,23,27,28] showed that dynamic code generation, software feedback, and fine-grain modular kernel organization are useful implementation techniques for improving the performance of operating system kernels. In addition, and perhaps more importantly, we discovered that there are strong interactions between the techniques. Hence, a careful and systematic combination of the techniques can be very powerful even though each one by itself may have serious limitations. By identifying these interactions we illustrate the problems of applying each technique in isolation to existing kernels. We also highlight the important common under-pinnings of the Synthesis experience and present our ideas on future operating system design and implementation. Finally, we outline a more uniform approach to dynamic optimizations called incremental partial evaluation

    The ALICE TPC, a large 3-dimensional tracking device with fast readout for ultra-high multiplicity events

    Get PDF
    The design, construction, and commissioning of the ALICE Time-Projection Chamber (TPC) is described. It is the main device for pattern recognition, tracking, and identification of charged particles in the ALICE experiment at the CERN LHC. The TPC is cylindrical in shape with a volume close to 90 m^3 and is operated in a 0.5 T solenoidal magnetic field parallel to its axis. In this paper we describe in detail the design considerations for this detector for operation in the extreme multiplicity environment of central Pb--Pb collisions at LHC energy. The implementation of the resulting requirements into hardware (field cage, read-out chambers, electronics), infrastructure (gas and cooling system, laser-calibration system), and software led to many technical innovations which are described along with a presentation of all the major components of the detector, as currently realized. We also report on the performance achieved after completion of the first round of stand-alone calibration runs and demonstrate results close to those specified in the TPC Technical Design Report.Comment: 55 pages, 82 figure

    A novel parallel algorithm for surface editing and its FPGA implementation

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophySurface modelling and editing is one of important subjects in computer graphics. Decades of research in computer graphics has been carried out on both low-level, hardware-related algorithms and high-level, abstract software. Success of computer graphics has been seen in many application areas, such as multimedia, visualisation, virtual reality and the Internet. However, the hardware realisation of OpenGL architecture based on FPGA (field programmable gate array) is beyond the scope of most of computer graphics researches. It is an uncultivated research area where the OpenGL pipeline, from hardware through the whole embedded system (ES) up to applications, is implemented in an FPGA chip. This research proposes a hybrid approach to investigating both software and hardware methods. It aims at bridging the gap between methods of software and hardware, and enhancing the overall performance for computer graphics. It consists of four parts, the construction of an FPGA-based ES, Mesa-OpenGL implementation for FPGA-based ESs, parallel processing, and a novel algorithm for surface modelling and editing. The FPGA-based ES is built up. In addition to the Nios II soft processor and DDR SDRAM memory, it consists of the LCD display device, frame buffers, video pipeline, and algorithm-specified module to support the graphics processing. Since there is no implementation of OpenGL ES available for FPGA-based ESs, a specific OpenGL implementation based on Mesa is carried out. Because of the limited FPGA resources, the implementation adopts the fixed-point arithmetic, which can offer faster computing and lower storage than the floating point arithmetic, and the accuracy satisfying the needs of 3D rendering. Moreover, the implementation includes Bézier-spline curve and surface algorithms to support surface modelling and editing. The pipelined parallelism and co-processors are used to accelerate graphics processing in this research. These two parallelism methods extend the traditional computation parallelism in fine-grained parallel tasks in the FPGA-base ESs. The novel algorithm for surface modelling and editing, called Progressive and Mixing Algorithm (PAMA), is proposed and implemented on FPGA-based ES’s. Compared with two main surface editing methods, subdivision and deformation, the PAMA can eliminate the large storage requirement and computing cost of intermediated processes. With four independent shape parameters, the PAMA can be used to model and edit freely the shape of an open or closed surface that keeps globally the zero-order geometric continuity. The PAMA can be applied independently not only FPGA-based ESs but also other platforms. With the parallel processing, small size, and low costs of computing, storage and power, the FPGA-based ES provides an effective hybrid solution to surface modelling and editing
    corecore