1,440 research outputs found

    Self-timed field programmmable gate array architectures

    Get PDF

    SpinLink: An interconnection system for the SpiNNaker biologically inspired multi-computer

    No full text
    SpiNNaker is a large-scale biologically-inspired multi-computer designed to model very heavily distributed problems, with the flagship application being the simulation of large neural networks. The project goal is to have one million processors included in a single machine, which consequently span many thousands of circuit boards. A computer of this scale imposes large communication requirements between these boards, and requires an extensible method of connecting to external equipment such as sensors, actuators and visualisation systems. This paper describes two systems that can address each of these problems.Firstly, SpinLink is a proposed method of connecting the SpiNNaker boards by using time-division multiplexing (TDM) to allow eight SpiNNaker links to run at maximum bandwidth between two boards. SpinLink will be deployed on Spartan-6 FPGAs and uses a locally generated clock that can be paused while the asynchronous links from SpiNNaker are sending data, thus ensuring a fast and glitch-free response. Secondly, SpiNNterceptor is a separate system, currently in the early stages of design, that will build upon SpinLink to address the important external I/O issues faced by SpiNNaker. Specifically, spare resources in the FPGAs will be used to implement the debugging and I/O interfacing features of SpiNNterceptor

    The NSR processor

    Get PDF
    Journal ArticleThe NSR (Non-Synchronous RISC) processor is a general-purpose computer structured (IS U collection of self-timed blocks that operate concurrently and communicate over bundled data channels in the style of micropipelines [3, 16]. These blocks correspond to standard synchronous pipeline stages such us Instruction Fetch, Instruction Decode, Execute, Memory and register File, but each operates concurrently as a separate self-timed process. In addition to being internally self-timed, the units are decoupled through self-timed FIFO queues between each of the units which allows U high degree of overlap in instruction execution. Branches, jumps, and memory accesses are also decoupled through the use of additional FIFO queues which can hide the execution latency of these instructions. A prototype implementation of the NSR processor has been constructed using Actel FPGAs (Field Programmable Gate Arrays)

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    A comparison of modular self-timed design styles

    Get PDF
    technical reportState-machine sequencing methods in modular 2-phase and 4-phase asynchronous handshake control are compared. Design styles are discussed, and the sequencers are tested against each other using a medium-scale minicomputer test design implemented in FPGAs. Seven 4-phase sequencers are tested. In these comparisons, 2- phase control is faster than 4-phase. Within the 4-phase designs, speed is enhanced when work is overlapped with handshake restoration. Performance of asynchronous and synchronous designs is compared

    Low latency self-timed flow-through FIFOs

    Get PDF
    Journal ArticleSelf-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in this type of FIFO as the communication required to send new data to the FIFO is local to only the first element of the FIFO. Circuit density can also be high because the control overhead is very small. However, because data must travel through every cell in the FIFO when moving from input to output, latencies can be long. This paper describes some alternative approaches to building self-timed flow-through FIFOs that reduce the latency while retaining the high throughput and relative simplicity of a flow-through design. Five designs are presented: a standard linear pow-through FIFO in which the data pass through every latch in the FIFO, a parallel FIFO in which data are delivered in turn to a set of parallel flow-through FIFOs, a tree FIFO in which data are fanned out into a tree of simple FIFOs, a square FIFO in which the tree is organized as a square array to achieve better layout packing, and a folded FIFO in which data will try to skip as many of the empty FIFO cells as possible to find the shortest path to the output

    Design of Timer for Application in ATM using FPGA and VHDL

    Get PDF
    A watchdog timer is a computer hardware timing device that triggers a system reset if the main program, due to some fault condition, such as a hang, neglects to regularly service the watchdog (writing a “service pulse” to it, also referred to as “petting the dog”). The intention is to bring the system back from the hung state into normal operation. Such a timer has got various important applications, one of them being in ATMs (Automated Teller Machine) which we have studied and designed in our project

    A self-timed multipurpose delay sensor for field programmable gate arrays (FPGAs)

    Get PDF
    This paper presents a novel self-timed multi-purpose sensor especially conceived for Field Programmable Gate Arrays (FPGAs). The aim of the sensor is to measure performance variations during the life-cycle of the device, such as process variability, critical path timing and temperature variations. The proposed topology, through the use of both combinational and sequential FPGA elements, amplifies the time of a signal traversing a delay chain to produce a pulse whose width is the sensor’s measurement. The sensor is fully self-timed, avoiding the need for clock distribution networks and eliminating the limitations imposed by the system clock. One single off- or on-chip time-to-digital converter is able to perform digitization of several sensors in a single operation. These features allow for a simplified approach for designers wanting to intertwine a multi-purpose sensor network with their application logic. Employed as a temperature sensor, it has been measured to have an error of ±0.67 °C, over the range of 20–100 °C, employing 20 logic elements with a 2-point calibration
    corecore