22 research outputs found

    Embedded demonstrator for audio manipulation

    Get PDF
    Demonstration of embedded systems is a good way to motivate and recruit students to a future career in electronics. For Department of Electronics and Telecommunication at the Norwegian University of Science and Technology (NTNU), it is thus desirable to have an embedded demonstrator that gives the pupils an insight in what is actually possible when studying electronics at the university, a system that the department may present at different occasions. A good embedded demonstrator provides an interesting presentation of one or more topics related to electronics, and should be presented together with relevant theory in order to provide a level of education to the user.This report covers the implementation of an embedded demonstrator for audio manipulation on Altera's DE2 development and education board. The system is specified to demonstrate signal processing subjects like sampling and filtering through manipulation of analog audio signals. The main modules in the system are the Cyclone II 2C35 FPGA from Altera, running a Nios II soft-CPU, and a Wolfson WM8731 audio-codec. The specification of their operation is made with background in pedagogics theory in order to make the most interesting demonstration. To realize this specification, the system incorporates several design features for both activation and motivation of the user.The audio manipulator provides possibilities for comparison between different sample rates and filter characteristics in real-time operation. This makes the system well suited for practical demonstration of signal processing theory. Due to the presentation of perceivable results, in addition to the implementation of a user interface for interaction, the implemented audio demonstrator is considered to be a well suited platform for demonstration of topics related to electronics

    A space-efficient quantum computer simulator suitable for high-speed FPGA implementation

    Full text link
    Conventional vector-based simulators for quantum computers are quite limited in the size of the quantum circuits they can handle, due to the worst-case exponential growth of even sparse representations of the full quantum state vector as a function of the number of quantum operations applied. However, this exponential-space requirement can be avoided by using general space-time tradeoffs long known to complexity theorists, which can be appropriately optimized for this particular problem in a way that also illustrates some interesting reformulations of quantum mechanics. In this paper, we describe the design and empirical space-time complexity measurements of a working software prototype of a quantum computer simulator that avoids excessive space requirements. Due to its space-efficiency, this design is well-suited to embedding in single-chip environments, permitting especially fast execution that avoids access latencies to main memory. We plan to prototype our design on a standard FPGA development board.Comment: 12 pages, 6 figures, presented at Quantum Information and Computation VII, Orlando, April 2009. Author reprint of final submitted manuscrip

    Hardware-Based Particle Filter with Evolutionary Resampling Stage

    Get PDF
    Autonomous systems require, in most of the cases, reasoning and decision-making capabilities. Moreover, the decision process has to occur in real time. Real-time computing means that every situation or event has to have an answer before a temporal deadline. In complex applications, these deadlines are usually in the order of milliseconds or even microseconds if the application is very demanding. In order to comply with these timing requirements, computing tasks have to be performed as fast as possible. The problem arises when computations are no longer simple, but very time-consuming operations. A good example can be found in autonomous navigation systems with visual-tracking submodules where Kalman filtering is the most extended solution. However, in recent years, some interesting new approaches have been developed. Particle filtering, given its more general problem-solving features, has reached an important position in the field. The aim of this thesis is to design, implement and validate a hardware platform that constitutes itself an embedded intelligent system. The proposed system would combine particle filtering and evolutionary computation algorithms to generate intelligent behavior. Traditional approaches to particle filtering or evolutionary computation have been developed in software platforms, including parallel capabilities to some extent. In this work, an additional goal is fully exploiting hardware implementation advantages. By using the computational resources available in a FPGA device, better performance results in terms of computation time are expected. These hardware resources will be in charge of extensive repetitive computations. With this hardware-based implementation, real-time features are also expected

    On High-Performance Parallel Fixed-Point Decimal Multiplier Designs

    Full text link
    High-performance, area-efficient hardware implementation of decimal multiplication is preferred to slow software simulations in a number of key scientific and financial application areas, where errors caused by converting decimal numbers into their approximate binary representations are not acceptable. Multi-digit parallel decimal multipliers involve two major stages: (i) the partial product generation (PPG) stage, where decimal partial products are determined by selecting the right versions of the pre-computed multiples of the multiplicand, followed by (ii) the partial product accumulation (PPA) stage, where all the partial products are shifted and then added together to obtain the final multiplication product. In this thesis, we propose a parallel architecture for fixed-point decimal multiplications based on the 8421-5421 BCD representation. In essence, we apply a hybrid 8421-5421 recoding scheme to help simplify the computation logic of the PPG. In the following PPA stage, these generated partial products are accumulated using 8421 carry-lookahead adders (CLAs) organized as a tree structure; this organization is a significant departure from the traditional carry-save-adder-based (CSA) approach, which suffers from the problems introduced by extra recoding logic and/or addition circuits needed. In addition to the proposed 8421-5421-based decimal multiplier, we also propose a 4221-based decimal multi-plier that is built upon a novel full adder for 4221 BCD codes; in this design, expensive 4221-to-8421 conversions are no longer needed, and as a result, the operands of this 4221 multiplier can be directly represented in 4221 BCD. The proposed 16x16 decimal multipliers are compared against other best known decimal multiplier designs in terms of delays and delay-area products with a TSMC 90nm technology. The evaluation results have confirmed that the proposed 8421-5421 multiplier achieves the lowest delay and is the most time-area efficient design among all the existing hardware-based BCD multipliers

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Profile-directed specialisation of custom floating-point hardware

    No full text
    We present a methodology for generating floating-point arithmetic hardware designs which are, for suitable applications, much reduced in size, while still retaining performance and IEEE-754 compliance. Our system uses three key parts: a profiling tool, a set of customisable floating-point units and a selection of system integration methods. We use a profiling tool for floating-point behaviour to identify arithmetic operations where fundamental elements of IEEE-754 floating-point may be compromised, without generating erroneous results in the common case. In the uncommon case, we use simple detection logic to determine when operands lie outside the range of capabilities of the optimised hardware. Out-of-range operations are handled by a separate, fully capable, floatingpoint implementation, either on-chip or by returning calculations to a host processor. We present methods of system integration to achieve this errorcorrection. Thus the system suffers no compromise in IEEE-754 compliance, even when the synthesised hardware would generate erroneous results. In particular, we identify from input operands the shift amounts required for input operand alignment and post-operation normalisation. For operations where these are small, we synthesise hardware with reduced-size barrel-shifters. We also propose optimisations to take advantage of other profile-exposed behaviours, including removing the hardware required to swap operands in a floating-point adder or subtractor, and reducing the exponent range to fit observed values. We present profiling results for a range of applications, including a selection of computational science programs, Spec FP 95 benchmarks and the FFMPEG media processing tool, indicating which would be amenable to our method. Selected applications which demonstrate potential for optimisation are then taken through to a hardware implementation. We show up to a 45% decrease in hardware size for a floating-point datapath, with a correctable error-rate of less then 3%, even with non-profiled datasets

    2D Digital Filter Implementation on a FPGA

    Get PDF
    The use of two dimensional (2D) digital filters for real-time 2D data processing has found important practical applications in many areas, such as aerial surveillance, satellite imaging and pattern recognition. In the case of military operations, real-time image pro-cessing is extensively used in target acquisition and tracking, automatic target recognition and identi cation, and guidance of autonomous robots. Furthermore, equal opportunities exist in civil industries such as vacuum cleaner path recognition and mapping and car collision detection and avoidance. Many of these applications require dedicated hardware for signal processing. It is not efficient to implement 2D digital filters using a single processor for real-time applications due to the large amount of data. A multiprocessor implementation can be used in order to reduce processing time. Previous work explored several realizations of 2D denominator separable digital filters with minimal throughput delay by utilizing parallel processors. It was shown that regardless of the order of the filter, a throughput delay of one adder and one multiplier can be achieved. The proposed realizations have high regularity due to the nature of the processors. In this thesis, all four realizations are implemented in a Field Programming Gate Array (FPGA) with floating point adders, multipliers and shift registers. The implementation details and design trade-offs are discussed. Simulation results in terms of performance, area and power are compared. From the experimental results, realization four is the ideal candidate for implementation on an Application Specific Integrated Circuit (ASIC) since it has the best performance, dissipates the lowest power, and uses the least amount of logic when compared to other realizations of the same filter size. For a filter size of 5 by 5, realization four can produce a throughput of 16.3 million pixels per second, which is comparable to realization one and about 34% increase in performance compared to realization one and two. For the given filter size, realization four dissipates the same amount of dynamic power as realization one, and roughly 54% less than realization three and 140% less than realization two. Furthermore, area reduction can be applied by converting floating point algorithms to fixed point algorithms. Alternatively, the denormalization and normalization stage of the floating point pipeline can be eliminated and fused together in order to save hardware resources

    FPGA based reconfigurable body area network using Nios II and uClinux

    Get PDF
    This research is focused on identifying an appropriate design for a reconfigurable Body Area Network (BAN). In order to investigate the benefits and drawbacks of the proposed design, a BAN system prototype was built. This system consists of two distinct node types: a slave node and a master node. These nodes communicate using ZigBee radio transceivers. The microcontroller-based slave node acquires sensor data and transmits digitized samples to the master node. The master node is FPGA-based and runs uClinux on a soft-core microcontroller. The purpose of the master node is to receive, process and store digitized sensor data. In order to verify the operation of the BAN system prototype and demonstrate reconfigurability, a specific application was required. Pattern recognition in electrocardiogram (ECG) data was the application used in this work and the MIT-BIH Arrhythmia Database was used as the known data source for verification. A custom test platform was designed and built for the purpose of injecting data from the MIT-BIH Arrhythmia Database into the BAN system. The BAN system designed and built in this work demonstrates the ability to record raw ECG data, detect R-peaks, calculate and record R-R intervals, detect premature ventricular and atrial contractions. As this thesis will identify, many aspects of this BAN system were designed to be highly reconfigurable allowing it to be used for a wide range of BAN applications, in addition to pattern recognition of ECG data

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    Implementation of a structured PL/I subset compiler

    Get PDF
    The thesis describes the design and implementation of a PL/I subset compiler which produces a hypothetical stack code as output. The compiler was based on a Pascal compiler developed by N. Wirth and U. Amman of Eidgenössische Technische Hochschule, Zurich, and was itself written in the Pascal language. PL/I programs using the compiler can now be compiled and executed (interpretively) on the UNIVAC 1106 computer at U.C.T. The compiler was designed mainly as a teaching system. Its lesson is that structured programming is a powerful technique which facilitated its design and implementation
    corecore