104 research outputs found

    Active Noise Control using Variable step-size Griffiths’ LMS (VGLMS) algorithm on Real-Time platform

    Get PDF
    This paper proposes implementation of Griffith’s Variable step-size algorithm for Active Noise Control (ANC) on ADSP-TS201 EZ-Kit Lite. The dual computational units and execution of up to four instructions per cycle which are special features over other processors are best utilized to generate an optimized code. The VGLMS provides improved secondary path estimation and computations involved are marginal as the same gradient is used for step-size computation and coefficient adaptation. The improved secondary path estimate, in turn improves the ANC performance. Further, variable step-size algorithm is used for the main-path to achieve faster convergence. Both for narrowband (fundamental and its harmonics) and broadband noise fields, for a duct the attenuation achieved is 25 dB and 15 dB respectively. The program execution time was only 1.25% for an input sampling rate of 1 KHz which indicates the utility of the special features of the processor considered. Further these features have enabled in bringing down the program memory requirement in the implementation of the algorithm

    Digital signal processor fundamentals and system design

    Get PDF
    Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years and have largely contributed to the evolution towards digital technology of many accelerator systems, such as machine protection, diagnostics and control of beams, power supply and motors. This paper aims at familiarising the reader with DSP fundamentals, namely DSP characteristics and processing development. Several DSP examples are given, in particular on Texas Instruments DSPs, as they are used in the DSP laboratory companion of the lectures this paper is based upon. The typical system design flow is described; common difficulties, problems and choices faced by DSP developers are outlined; and hints are given on the best solution

    Parallelised max-log-MAP model

    Get PDF
    A paralleliscd max-Log-MAP model (P-max-Log-MAP) that exploits the sub-word parallelism and very long instruction word architccture of a microprocessor or a digital signal processor (DSP) is presented. The proposed model rcduccs considerably thc computational complexity of the max-Log-MAP algorithm; valid therefore facilitates easy implementation

    Design and synthesis of a high-performance, hyper-programmable DSP on an FPGA

    Get PDF
    In the field of high performance digital signal processing, DSPs and FPGAs provide the most flexibility. Due to the extensive customization available on FPGAs, DSP algorithm implementation on an FPGA exhibits an increased development time over programming a processor. Because of this, traditional DSPs typically yield a faster time to market than an FPGA design. However, it is often desirable to have the ASIC-like performance that is attainable through the additional customization and parallel computation available through an FPGA. This can be achieved through the class of processors known as hyper-programmable DSPs. A hyper-programmable DSP is a DSP in which multiple aspects of the architecture are programmable. This thesis contributes such a DSP, targeted for high-performance and realized in hardware using an FPGA. The design consists of both a scalar datapath and a vector datapath capable of parallel operations, both of which are extensively customizable. To aid in the design of the datapaths, graphical tools are introduced as an efficient way to modify the design. A tool was also created to supply a graphical interface to help write instructions for the vector datapath. Additionally, an adaptive assembler was created to convert assembly programs to machine code for any datapath design. The resulting design was synthesized for a Cyclone III FPGA. The synthesis resulted in a design capable of running at 135MHz with 61% of the logic used by processing elements. Benchmarks were run on the design to evaluate its performance. The benchmarks showed similar performance between the proposed design and commercial DSPs for the simple benchmarks but significant improvement for the more complex ones

    Analysis and Architecture Design of DSPACE, a Digital Signal Processor for space applications

    Get PDF
    The request of digital signal processing performed on satellites or spacecraft is greatly increased in past years, however the European Space Agency (ESA) has not got a suitable device for these applications made in Europe area. ESA is currently forced to address to United States (US) made alternatives but the exportation of those devices is restricted by the International Traffic in Arms Regulations (ITAR) and this places ESA in a dependent position. The DSPACE project aim to solve this lack providing a new Digital Signal Processor (DSP), as an intellectual property, and a software tool-chain to exploit its features. The first part of this thesis work regarded an analysis of the state-of-the-art and the practical solutions in order to identify a target technology and a reference architecture. The second part of this work concerned a detailed definitions of the DSPACE core architecture and features. Moreover a complete decode & dispatch VHDL model, with a formal functional verification, was realized. The third part of this work regarded two caches modelling, the instruction and the data cache, that are two essential components of the DSPACE core. This thesis work was concluded with the first functional simulations coming from the DSPACE model and considerations about the resource occupation of the core

    Modulo scheduling for a fully-distributed clustered VLIW architecture

    Get PDF
    Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. We propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also proposed. This algorithm takes into account both register and memory inter-cluster communications so that the final schedule results in a cluster assignment that favors cluster locality in cache references and register accesses. It has been evaluated for both 2- and 4-cluster configurations and for differing numbers and latencies of inter-cluster buses. The proposed algorithm produces schedules with very low communication requirements and outperforms previous cluster-oriented schedulers.Peer ReviewedPostprint (published version

    Small Microphone Array: Algorithms and Hardware

    Get PDF
    This report describes the processing algorithms and gives an overview of the hardware for the small microphone array unit in the IM2.RTMAP (Real-time Microphone Array Processing) project. The algorithms include techniques for speech enhancement, speaker localisation and speaker segmentation. The hardware consists of a DSP platform with 8 audio inputs and outputs, as well as a Fireware interface for communication with a PC or other modules

    A unified modulo scheduling and register allocation technique for clustered processors

    Get PDF
    This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.Peer ReviewedPostprint (published version

    An evaluation of different DLP alternatives for the embedded media domain

    Get PDF
    The importance of media processing has produced a revolution in the design of embedded processors. In order to face the high computational and technological demands of near future media applications, new embedded processors are including features that were commonly restricted to the general purpose and the supercomputing domains. In this paper we have evaluated the performance of various DLP (Data Level Parallelism) oriented embedded architectures and analyzed quantitative data in order to determine the highlights and disadvantages of each approach. Additionally we have analyzed the differences between the explicit parallel versions of code (often based on the standard algorithms) and the high-tuned, non-vectorizable versions usually found in real multimedia programs. We will show that sub-word SIMD architectures (like MMX) are a very costeffective solution, and that, while long vector architectures provide few improvements at a very high cost, a smart combination between vector and SIMD-like architectures is the alternative that leverages best performance at a reasonable cost. We will also show that the memory latency tolerance, typical of vector architectures, is partially compensated by the worse spatial locality found when executing vector code.Postprint (author's final draft
    corecore