1,195 research outputs found

    Hiding State in CλaSH Hardware Descriptions

    Get PDF
    Synchronous hardware can be modelled as a mapping from input and state to output and a new state, such mappings are referred to as transition functions. It is natural to use a functional language to implement transition functions. The CaSH compiler is capable of translating transition functions to VHDL. Modelling hardware using multiple components is convenient. Components in CaSH can be considered as instantiations of functions. To avoid packing and unpacking state when composing components, functions are lifted to arrows. By using arrows the chance of making errors will decrease as it is not required to manually (un)pack the state. Furthermore, the Haskell do-syntax for arrows increases the readability of hardware designs. This is demonstrated using a realistic example of a circuit which consists of multiple components

    A Dual Digital Signal Processor VME Board For Instrumentation And Control Applications

    Full text link
    A Dual Digital Signal Processing VME Board was developed for the Continuous Electron Beam Accelerator Facility (CEBAF) Beam Current Monitor (BCM) system at Jefferson Lab. It is a versatile general-purpose digital signal processing board using an open architecture, which allows for adaptation to various applications. The base design uses two independent Texas Instrument (TI) TMS320C6711, which are 900 MFLOPS floating-point digital signal processors (DSP). Applications that require a fixed point DSP can be implemented by replacing the baseline DSP with the pin-for-pin compatible TMS320C6211. The design can be manufactured with a reduced chip set without redesigning the printed circuit board. For example it can be implemented as a single-channel DSP with no analog I/O.Comment: 3 PDF page

    Design, development and use of the finite element machine

    Get PDF
    Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained

    Reducing Power Consumption Using Customized Numerical Representations in Digital Hearing Aids

    Get PDF
    This thesis examines the effects of changing the numerical representation of audio signals in digital hearing aids to minimize power consumption. Within the hearing aid design a majority of the power used is consumed in the many finite impulse response filters. The main processing involved in these filters is a multiply-accumulate function. We examine the power consumption of 12 different multiply-accumulate units that use the following numerical representations: a 16-bit linear representation, a 9-bit logarithmic representation, and 10 different floating-point rep-representations ranging from 9 to 13 bits. A selection of the multiply-accumulators are simulated using a continuous-circuit simulator. The power estimates from this are compared with signal transition counts from a discrete event simulator to quantify the relationship between transition counts and power consumption. This relationship is then used to examine other numerical representations

    A standard-cell self-timed multiplier for energy and area critical synchronous systems

    Get PDF
    Journal ArticleThis paper describes the design of a standard-cell self-timed multiplier for use in energy and area critical synchronous systems. The area of this multiplier is bounded by N rather than N2 as seen in more traditional combinational parallel array designs, where N is the word size. Energy has a polynomial growth with word size, but has a coefficient that is much smaller than that seen in a combinational array design. Although the multiplier is self-tamed, it can be embedded in a synchronous system appearing as a combinational element. This paper presents latency, area, and energy estimates for the multiplier implemented at various word sizes, and compares these numbers with a traditional combinational array multiplier. The self-timed multiplier uses 1/3 the energy and 1/7 the area of the combinational design fora 24-bit word size

    Low-Power and Reconfigurable Asynchronous ASIC Design Implementing Recurrent Neural Networks

    Get PDF
    Artificial intelligence (AI) has experienced a tremendous surge in recent years, resulting in high demand for a wide array of implementations of algorithms in the field. With the rise of Internet-of-Things devices, the need for artificial intelligence algorithms implemented in hardware with tight design restrictions has become even more prevalent. In terms of low power and area, ASIC implementations have the best case. However, these implementations suffer from high non-recurring engineering costs, long time-to-market, and a complete lack of flexibility, which significantly hurts their appeal in an environment where time-to-market is so critical. The time-to-market gap can be shortened through the use of reconfigurable solutions, such as FPGAs, but these come with high cost per unit and significant power and area deficiencies over their ASIC counterparts. To bridge these gaps, this dissertation work develops two methodologies to improve the usability of ASIC implementations of neural networks in these applications. The first method demonstrates a method for substantial reductions in design time for asynchronous implementations of a set of AI algorithms known as Recurrent Neural Networks (RNN) by analyzing the possible architectures and implementing a library of generic or easily altered components that can be used to quickly implement a chosen RNN architecture. A tapeout of this method was completed using as few as 112 hours of labor by the designer from RNN selection to a DRC/LVS clean chip layout ready for fabrication. The second method develops a flow to implement a set of RNNs in a single reconfigurable ASIC, offering a middle ground between fully reconfigurable solutions and completely application-specific implementations. This reconfigurable design is capable of representing thousands of possible RNN configurations in a single IC. A tapeout of this design was also completed, with both tapeouts using the TSMC 65nm bulk CMOS process
    corecore