16,801 research outputs found
Workshop on Verification and Theorem Proving for Continuous Systems (NetCA Workshop 2005)
Oxford, UK, 26 August 200
A framework for FPGA functional units in high performance computing
FPGAs make it practical to speed up a program by defining
hardware functional units that perform calculations faster than can be achieved in software. Specialised digital circuits avoid the overhead of executing sequences of instructions, and they make available the massive parallelism of the components. The FPGA operates as a coprocessor controlled by a conventional computer. An application that combines software with hardware in
this way needs an interface between a communications port to the processor and the signals connected to the functional units. We present a framework that supports the design of such systems. The framework consists of a generic controller circuit defined in VHDL that can be configured by the user according to the needs of the functional units and the I/O channel. The controller
contains a register file and a pipelined programmable register transfer machine, and it supports the design of both stateless and stateful functional units. Two examples are described: the implementation of a set of basic stateless arithmetic functional units, and the implementation of a stateful algorithm that exploits circuit parallelism
Interpretive computer simulator for the NASA Standard Spacecraft Computer-2 (NSSC-2)
An Interpretive Computer Simulator (ICS) for the NASA Standard Spacecraft Computer-II (NSSC-II) was developed as a code verification and testing tool for the Annular Suspension and Pointing System (ASPS) project. The simulator is written in the higher level language PASCAL and implented on the CDC CYBER series computer system. It is supported by a metal assembler, a linkage loader for the NSSC-II, and a utility library to meet the application requirements. The architectural design of the NSSC-II is that of an IBM System/360 (S/360) and supports all but four instructions of the S/360 standard instruction set. The structural design of the ICS is described with emphasis on the design differences between it and the NSSC-II hardware. The program flow is diagrammed, with the function of each procedure being defined; the instruction implementation is discussed in broad terms; and the instruction timings used in the ICS are listed. An example of the steps required to process an assembly level language program on the ICS is included. The example illustrates the control cards necessary to assemble, load, and execute assembly language code; the sample program to to be executed; the executable load module produced by the loader; and the resulting output produced by the ICS
A 16-bit CORDIC rotator for high-performance wireless LAN
In this paper we propose a novel 16-bit low power CORDIC rotator that is used for high-speed wireless LAN. The algorithm converges to the final target angle by adaptively selecting appropriate iteration steps while keeping the scale factor virtually constant. The VLSI architecture of the proposed design eliminates the entire arithmetic hardware in the angle approximation datapath and reduces the number of iterations by 50% on an average. The cell area of the processor is 0.7 mm2 and it dissipates 7 mW power at 20 MHz frequency
Pipelining Saturated Accumulation
Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
- …