214 research outputs found

    An A-FPGA architecture for relative timing based asynchronous designs

    Get PDF
    pre-printThis paper presents an asynchronous FPGA architecture that is capable of implementing relative timing based asynchronous designs. The architecture uses the Xilinx 7-Series architecture as a starting point and proposes modifications that would make it asynchronous design capable while keeping it fully functional for synchronous designs. Even though the architecture requires additional components, it is observed when implemented on the 64-nm node, the area of the slice was increases marginally by 7%. The architecture leaves configurable routing structures untouched and does not compromise on performance of the synchronous architecture

    A cell set for self-timed design using actel FPGAs

    Get PDF
    technical reportAsynchronous or self-timed systems that do not rely on a global clock to keep system components synchronized can offer significant advantages over traditional clocked circuits in a variety of applications. However, these systems require that suitable self-timed circuit primitives are available for building the system. This report describes a cell set designed for building self-timed circuits and systems using Actel field programmable gate arrays (FPGAs). The cells use a two-phase transition signalling protocol for control signals and a bundled protocol for data signals. This library of macro cells is designed to be used with the Workmen tool suite from VIEWlogic and the Action Logic System (ALS) from Actel

    Self-timed field programmmable gate array architectures

    Get PDF

    SpinLink: An interconnection system for the SpiNNaker biologically inspired multi-computer

    No full text
    SpiNNaker is a large-scale biologically-inspired multi-computer designed to model very heavily distributed problems, with the flagship application being the simulation of large neural networks. The project goal is to have one million processors included in a single machine, which consequently span many thousands of circuit boards. A computer of this scale imposes large communication requirements between these boards, and requires an extensible method of connecting to external equipment such as sensors, actuators and visualisation systems. This paper describes two systems that can address each of these problems.Firstly, SpinLink is a proposed method of connecting the SpiNNaker boards by using time-division multiplexing (TDM) to allow eight SpiNNaker links to run at maximum bandwidth between two boards. SpinLink will be deployed on Spartan-6 FPGAs and uses a locally generated clock that can be paused while the asynchronous links from SpiNNaker are sending data, thus ensuring a fast and glitch-free response. Secondly, SpiNNterceptor is a separate system, currently in the early stages of design, that will build upon SpinLink to address the important external I/O issues faced by SpiNNaker. Specifically, spare resources in the FPGAs will be used to implement the debugging and I/O interfacing features of SpiNNterceptor

    An integrated soft- and hard-programmable multithreaded architecture

    Get PDF

    Petri nets based components within globally asynchronous locally synchronous systems

    Get PDF
    Dissertação apresentada na Faculdade de CiĂȘncias e Tecnologias da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia ElectrotĂ©cnica e ComputadoresThe main goal is to develop a solution for the interconnection of components constituent of a GALS - Globally Asynchronous, Locally Synchronous – system. The components are implemented in parallel obtained as a result of the partition of a model expressed a Petri net (PN), performed using the PNs editor SNOOPY-IOPT in conjunction with the Split tool and the tools to automatically generate the VHDL code from the representations of the PNML models resulting from the partition (these tools were developed under the project FORDESIGN and are available at http://www.uninova.pt/FORDESIGN). Typical solutions will be analyzed to ensure proper communication between components of the GALS system, as well as characterized and developed an appropriate solution for the interconnection of the components associated with the PN sub-models. The final goal (not attained with this thesis) would be to acquire a tool that allows generation of code for the interconnection solution from the associated components, considering a specific application. The solution proposed for componentes interconnection was coded in VHDL and the implementation platforms used for testing include the Xilinx FPGA Spartan-3 and Virtex-II

    Pipelined Asynchronous High Level Synthesis for General Programs

    Get PDF
    High-level synthesis (HLS) translates algorithms from software programming language into hardware. We use the dataflow HLS methodology to translate programs into asynchronous circuits by implementing programs using asynchronous dataflow elements as hardware building blocks. We extend the prior work in dataflow synthesis in the following aspects:i) we propose Fluid to synthesize pipelined dataflow circuits for real-world programs with complex control flows, which are not supported in the previous work; ii) we propose PipeLink to permit pipelined access to shared resources in the dataflow circuit. Dataflow circuit results in distributed control and an implicitly pipelined implementation. However, resource sharing in the presence of pipelining is challenging in this context due to the absence of a global scheduler. Traditional solutions to this problem impose restrictions on pipelining to guarantee mutually exclusive access to the shared resource, but PipeLink removes such restrictions and can generate pipelined asynchronous dataflow circuits for shared function calls, pipelined memory accesses and function pointers; iii) we apply several dataflow optimizations to improve the quality of the synthesized dataflow circuits; iv) we implement our system (Fluid + PipeLink) on the LLVM compiler framework, which allows us to take advantage of the optimization efforts from the compiler community; v) we compare our system with a widely-used academic HLS tool and two commercial HLS tools. Compared to commercial (academic) HLS tools, our system results in 12X (20X) reduction in energy, 1.29X (1.64X) improvement in throughput, 1.27X (1.61X) improvement in latency at a cost of 2.4X (1.61X) increase in the area

    Dynamically reconfigurable asynchronous processor

    Get PDF
    The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13Όm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

    Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL

    Get PDF
    This paper proposes the design of a FPGA configurable logic block (CLB) using asynchronous static NULL convention logic (NCL) Library. The proposed design uses three static LUT\u27s for implementing NCL logic functions. Each LUT can be configured to function as any one of the 27 fundamental NCL Static gates. The proposed CLB supports 10 inputs and three different outputs, each with resettable and inverting variations. The CLB has two modes: Configuration mode and operation mode. The static NCL FPGA CLB is simulated at the transistor level using the 1.8 V, 180 nm TSMC CMOS process
    • 

    corecore