190 research outputs found
Finite State Machines With Input Multiplexing: A Performance Study
Finite state machines with input multiplexing (FSMIMs)
have been proposed in previous works as a technique for efficient
mapping FSMs into ROM memory. In this paper, we propose a new
architecture for implementing FSMIMs, called FSMIM with state-based
input selection, whose goal is to achieve a further reduction in memory
usage. This paper also describes in detail the algorithms for generating
FSMIMs used by the tool FSMIM-Gen, which has been developed
and made available on the Internet for free public use. A comparative
study in terms of speed and area between FSMIM approaches
and other field programmable gate array-based techniques is presented.
The results show that the FSMIM approaches obtain huge
reductions in the look-up table (LUT) usage by using a small number
of embedded memory blocks. In addition, speed improvements
over conventional LUT-based implementations have been obtained in
many cases
ROM-Based Finite State Machine Implementation in Low Cost FPGAs
This work presents a technique for the resource optimization of input multiplexed ROM-based Finite State Machines. This technique exploits the don't care value of the inputs to reduce the memory size as well as multiplexer complexity. This technique has been applied to a publicly available FSM benchmarks and implemented in a low-cost FPGA. Results have been compared with tools supported ROM and standard logic cells implementations. In a significant number of test cases, the proposed technique is the best design alternative, both in resource requirements and speed
Generic low power reconfigurable distributed arithmetic processor
Higher performance, lower cost, increasingly minimizing integrated circuit components, and
higher packaging density of chips are ongoing goals of the microelectronic and computer
industry. As these goals are being achieved, however, power consumption and flexibility are
increasingly becoming bottlenecks that need to be addressed with the new technology in Very
Large-Scale Integrated (VLSI) design.
For modern systems, more energy is required to support the powerful computational capability
which accords with the increasing requirements, and these requirements cause the change of
standards not only in audio and video broadcasting but also in communication such as wireless
connection and network protocols. Powerful flexibility and low consumption are repellent, but
their combination in one system is the ultimate goal of designers.
A generic domain-specific low-power reconfigurable processor for the distributed
arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor
features high efficiency in terms of area, power and delay, which approaches the
performance of an ASIC design, while retaining the flexibility of programmable platforms.
The architecture not only supports typical distributed arithmetic algorithms which can be
found in most still picture compression standards and video conferencing standards, but
also offers implementation ability for other distributed arithmetic algorithms found in
digital signal processing, telecommunication protocols and automatic control.
In this processor, a simple reconfigurable low power control unit is implemented with
good performance in area, power and timing. The generic characteristic of the architecture
makes it applicable for any small and medium size finite state machines which can be used
as control units to implement complex system behaviour and can be found in almost all
engineering disciplines. Furthermore, to map target applications efficiently onto the
proposed architecture, a new algorithm is introduced for searching for the best common
sharing terms set and it keeps the area and power consumption of the implementation at
low level. The software implementation of this algorithm is presented, which can be used
not only for the proposed architecture in this dissertation but also for all the
implementations with adder-based distributed arithmetic algorithms. In addition, some low
power design techniques are applied in the architecture, such as unsymmetrical design
style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection
and unsymmetrical mapping basic computing units. All these design techniques achieve
extraordinary power consumption saving. It is believed that they can be extended to more
low power designs and architectures.
The processor presented in this dissertation can be used to implement complex, high
performance distributed arithmetic algorithms for communication and image processing
applications with low cost in area and power compared with the traditional
methods
Decomposition tool targeting FPGA architectures
The growing interest in the field of logic synthesis targeting Field Programmable Gate Arrays (FPGA) and the active research carried out by a number of research groups in the area of functional decomposition is the prime motivation for this thesis. Logic synthesis has been an area of interest in many universities all over the world. The work involves the study and implementation of techniques and methods in logic synthesis. In this work, a logic synthesis tool has been developed implementing the aspects of general and complete Decomposition method based on functional decomposition techniques [4]. The tool is aimed at producing outputs faster and more efficient than the available software. C++ Standard template library is used to develop this tool. The output of this tool is designed to be compatible with the available vendor software. The tool has been tested on MCNC benchmarks and those created keeping in mind the industry requirements
Hardware implementation of non-bonded forces in molecular dynamics simulations
Molecular Dynamics is a computational method based on classical mechanics to describe the behavior of a molecular system. This method is used in biomolecular simulations, which are intended to contribute to the study and advance of nanotechnology, medicine, chemistry and biology. Software implementations of Molecular Dynamics simulations can spend most of time computing the non-bonded interactions. This work presents the design and implementation of an FPGA-based coprocessor that accelerates MD simulations by computing in parallel the non-bonded interactions, specifically, the van der Waals and the electrostatic interactions. These interactions are modeled as the Lennard-Jones 6-12 potential and the direct-space Ewald summation, respectively. In addition, this work introduces a novel variable transformation of the potential energy functions, and a novel interpolation method with pseudo-floating-point representation to compute the short-range forces. Also, it uses a combination of fixed-point and floating-point arithmetic to obtain the best of both representations. The FPGA coprocessor is a memory-mapped system connected to a host by PCI Express, and is provided with interruption capabilities to improve parallelization. Its main block is based on a single functional pipeline, and is connected via Avalon Bus to other peripherals such as the PCIe Hard-IP and the SG-DMA. It is implemented on an Altera¿s EP2AGX125EF35C4 device, can process 16k particles, and is configured to store up to 16 different types of particles. Simulations in a custom C-application for MD that only computes non-bonded forces become up to 12.5x faster using the FPGA coprocessor when considering 12500 atoms.PregradoINGENIERO(A) EN ELECTRÓNIC
- …