22 research outputs found
Embedded demonstrator for audio manipulation
Demonstration of embedded systems is a good way to motivate and recruit students to a future career in electronics. For Department of Electronics and Telecommunication at the Norwegian University of Science and Technology (NTNU), it is thus desirable to have an embedded demonstrator that gives the pupils an insight in what is actually possible when studying electronics at the university, a system that the department may present at different occasions. A good embedded demonstrator provides an interesting presentation of one or more topics related to electronics, and should be presented together with relevant theory in order to provide a level of education to the user.This report covers the implementation of an embedded demonstrator for audio manipulation on Altera's DE2 development and education board. The system is specified to demonstrate signal processing subjects like sampling and filtering through manipulation of analog audio signals. The main modules in the system are the Cyclone II 2C35 FPGA from Altera, running a Nios II soft-CPU, and a Wolfson WM8731 audio-codec. The specification of their operation is made with background in pedagogics theory in order to make the most interesting demonstration. To realize this specification, the system incorporates several design features for both activation and motivation of the user.The audio manipulator provides possibilities for comparison between different sample rates and filter characteristics in real-time operation. This makes the system well suited for practical demonstration of signal processing theory. Due to the presentation of perceivable results, in addition to the implementation of a user interface for interaction, the implemented audio demonstrator is considered to be a well suited platform for demonstration of topics related to electronics
A space-efficient quantum computer simulator suitable for high-speed FPGA implementation
Conventional vector-based simulators for quantum computers are quite limited
in the size of the quantum circuits they can handle, due to the worst-case
exponential growth of even sparse representations of the full quantum state
vector as a function of the number of quantum operations applied. However, this
exponential-space requirement can be avoided by using general space-time
tradeoffs long known to complexity theorists, which can be appropriately
optimized for this particular problem in a way that also illustrates some
interesting reformulations of quantum mechanics. In this paper, we describe the
design and empirical space-time complexity measurements of a working software
prototype of a quantum computer simulator that avoids excessive space
requirements. Due to its space-efficiency, this design is well-suited to
embedding in single-chip environments, permitting especially fast execution
that avoids access latencies to main memory. We plan to prototype our design on
a standard FPGA development board.Comment: 12 pages, 6 figures, presented at Quantum Information and Computation
VII, Orlando, April 2009. Author reprint of final submitted manuscrip
Hardware-Based Particle Filter with Evolutionary Resampling Stage
Autonomous systems require, in most of the cases, reasoning and decision-making capabilities. Moreover, the decision process has to occur in real time. Real-time computing means that every situation or event has to have an answer before a temporal deadline. In complex applications, these deadlines are usually in the order of milliseconds or even microseconds if the application is very demanding. In order to comply with these timing requirements, computing tasks have to be performed as fast as possible. The problem arises when computations are no longer simple, but very time-consuming operations.
A good example can be found in autonomous navigation systems with visual-tracking submodules where Kalman filtering is the most extended solution. However, in recent years, some interesting new approaches have been developed. Particle filtering, given its more general problem-solving features, has reached an important position in the field.
The aim of this thesis is to design, implement and validate a hardware platform that constitutes itself an embedded intelligent system. The proposed system would combine particle filtering and evolutionary computation algorithms to generate intelligent behavior.
Traditional approaches to particle filtering or evolutionary computation have been developed in software platforms, including parallel capabilities to some extent. In this work, an additional goal is fully exploiting hardware implementation advantages. By using the computational resources available in a FPGA device, better performance results in terms of computation time are expected. These hardware resources will be in charge of extensive repetitive computations. With this hardware-based implementation, real-time features are also expected
On High-Performance Parallel Fixed-Point Decimal Multiplier Designs
High-performance, area-efficient hardware implementation of decimal multiplication is preferred to slow software simulations in a number of key scientific and financial application areas, where errors caused by converting decimal numbers into their approximate binary representations are not acceptable.
Multi-digit parallel decimal multipliers involve two major stages: (i) the partial product generation (PPG) stage, where decimal partial products are determined by selecting the right versions of the pre-computed multiples of the multiplicand, followed by (ii) the partial product accumulation (PPA) stage, where all the partial products are shifted and then added together to obtain the final multiplication product. In this thesis, we propose a parallel architecture for fixed-point decimal multiplications based on the 8421-5421 BCD representation. In essence, we apply a hybrid 8421-5421 recoding scheme to help simplify the computation logic of the PPG. In the following PPA stage, these generated partial products are accumulated using 8421 carry-lookahead adders (CLAs) organized as a tree structure; this organization is a significant departure from the traditional carry-save-adder-based (CSA) approach, which suffers from the problems introduced by extra recoding logic and/or addition circuits needed. In addition to the proposed 8421-5421-based decimal multiplier, we also propose a 4221-based decimal multi-plier that is built upon a novel full adder for 4221 BCD codes; in this design, expensive 4221-to-8421 conversions are no longer needed, and as a result, the operands of this 4221 multiplier can be directly represented in 4221 BCD.
The proposed 16x16 decimal multipliers are compared against other best known decimal multiplier designs in terms of delays and delay-area products with a TSMC 90nm technology. The evaluation results have confirmed that the proposed 8421-5421 multiplier achieves the lowest delay and is the most time-area efficient design among all the existing hardware-based BCD multipliers
Performance and area evaluations of processor-based benchmarks on FPGA devices
The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology.
A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area.
For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs
Profile-directed specialisation of custom floating-point hardware
We present a methodology for generating
floating-point arithmetic hardware
designs which are, for suitable applications, much reduced in size, while still
retaining performance and IEEE-754 compliance. Our system uses three
key parts: a profiling tool, a set of customisable
floating-point units and a
selection of system integration methods.
We use a profiling tool for
floating-point behaviour to identify arithmetic
operations where fundamental elements of IEEE-754
floating-point may be
compromised, without generating erroneous results in the common case.
In the uncommon case, we use simple detection logic to determine when
operands lie outside the range of capabilities of the optimised hardware.
Out-of-range operations are handled by a separate, fully capable,
floatingpoint
implementation, either on-chip or by returning calculations to a host
processor. We present methods of system integration to achieve this errorcorrection.
Thus the system suffers no compromise in IEEE-754 compliance,
even when the synthesised hardware would generate erroneous results.
In particular, we identify from input operands the shift amounts required
for input operand alignment and post-operation normalisation. For operations
where these are small, we synthesise hardware with reduced-size
barrel-shifters. We also propose optimisations to take advantage of other
profile-exposed behaviours, including removing the hardware required to
swap operands in a floating-point adder or subtractor, and reducing the
exponent range to fit observed values.
We present profiling results for a range of applications, including a selection
of computational science programs, Spec FP 95 benchmarks and the
FFMPEG media processing tool, indicating which would be amenable to
our method. Selected applications which demonstrate potential for optimisation
are then taken through to a hardware implementation. We show up
to a 45% decrease in hardware size for a
floating-point datapath, with a
correctable error-rate of less then 3%, even with non-profiled datasets
2D Digital Filter Implementation on a FPGA
The use of two dimensional (2D) digital filters for real-time 2D data processing has found important practical applications in many areas, such as aerial surveillance, satellite
imaging and pattern recognition. In the case of military operations, real-time image pro-cessing is extensively used in target acquisition and tracking, automatic target recognition and identi cation, and guidance of autonomous robots. Furthermore, equal opportunities exist in civil industries such as vacuum cleaner path recognition and mapping and car collision detection and avoidance. Many of these applications require dedicated hardware for signal processing. It is not efficient to implement 2D digital filters using a single processor for real-time applications due to the large amount of data. A multiprocessor
implementation can be used in order to reduce processing time.
Previous work explored several realizations of 2D denominator separable digital filters
with minimal throughput delay by utilizing parallel processors. It was shown that regardless of the order of the filter, a throughput delay of one adder and one multiplier can be
achieved. The proposed realizations have high regularity due to the nature of the processors. In this thesis, all four realizations are implemented in a Field Programming Gate
Array (FPGA) with floating point adders, multipliers and shift registers. The implementation details and design trade-offs are discussed. Simulation results in terms of performance, area and power are compared.
From the experimental results, realization four is the ideal candidate for implementation on an Application Specific Integrated Circuit (ASIC) since it has the best performance, dissipates the lowest power, and uses the least amount of logic when compared to other realizations of the same filter size. For a filter size of 5 by 5, realization four can produce a throughput of 16.3 million pixels per second, which is comparable to realization one and about 34% increase in performance compared to realization one and two. For the given
filter size, realization four dissipates the same amount of dynamic power as realization one, and roughly 54% less than realization three and 140% less than realization two. Furthermore, area reduction can be applied by converting floating point algorithms to fixed point algorithms. Alternatively, the denormalization and normalization stage of the floating point pipeline can be eliminated and fused together in order to save hardware resources
FPGA based reconfigurable body area network using Nios II and uClinux
This research is focused on identifying an appropriate design for a reconfigurable
Body Area Network (BAN).
In order to investigate the benefits and drawbacks of the proposed design, a BAN
system prototype was built. This system consists of two distinct node types: a slave
node and a master node. These nodes communicate using ZigBee radio transceivers.
The microcontroller-based slave node acquires sensor data and transmits digitized
samples to the master node. The master node is FPGA-based and runs uClinux on
a soft-core microcontroller. The purpose of the master node is to receive, process
and store digitized sensor data. In order to verify the operation of the BAN system
prototype and demonstrate reconfigurability, a specific application was required.
Pattern recognition in electrocardiogram (ECG) data was the application used in
this work and the MIT-BIH Arrhythmia Database was used as the known data source
for verification. A custom test platform was designed and built for the purpose of
injecting data from the MIT-BIH Arrhythmia Database into the BAN system.
The BAN system designed and built in this work demonstrates the ability to record
raw ECG data, detect R-peaks, calculate and record R-R intervals, detect premature
ventricular and atrial contractions. As this thesis will identify, many aspects of this
BAN system were designed to be highly reconfigurable allowing it to be used for a
wide range of BAN applications, in addition to pattern recognition of ECG data
Embedded electronic systems driven by run-time reconfigurable hardware
Abstract
This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen
Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnologÃa hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando asà su implementación fÃsica –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnologÃa a través del prototipado de varias aplicaciones de ingenierÃa (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum
Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinà micament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinà mica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant aixà la seva implementació fÃsica –à rea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware està tic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria
Implementation of a structured PL/I subset compiler
The thesis describes the design and implementation of a PL/I subset compiler which produces a hypothetical stack code as output. The compiler was based on a Pascal compiler developed by N. Wirth and U. Amman of Eidgenössische Technische Hochschule, Zurich, and was itself written in the Pascal language. PL/I programs using the compiler can now be compiled and executed (interpretively) on the UNIVAC 1106 computer at U.C.T. The compiler was designed mainly as a teaching system. Its lesson is that structured programming is a powerful technique which facilitated its design and implementation