7 research outputs found

    The JM-Filter to detect specific frequency in monitored signal

    Get PDF
    The Discrete Fourier Transform (DFT) is a mathematical procedure that stands at the center of the processing inside a digital signal processor. It has been widely known and argued in relevant literature that the Fast Fourier Transform (FFT) is useless in detecting specific frequencies in a monitored signal of length N because most of the computed results are ignored. In this paper, we present an efficient FFT-based method to detect specific frequencies in a monitored signal, which will then be compared to the most frequently used method which is the recursive Goertzel algorithm that detects and analyses one selectable frequency component from a discrete signal. The proposed JM-Filter algorithm presents a reduction of iterations compared to the first and second order Goertzel algorithm by a factor of r, where r represents the radix of the JM-Filter. The obtained results are significant in terms of computational reduction and accuracy in fixed-point implementation. Gains of 15 dB and 19 dB in signal to quantization noise ratio (SQNR) were respectively observed for the proposed first and second order radix-8 JM-Filter in comparison to Goertzel algorithm

    The Design and Implementation of FFTW3

    Full text link

    Toatie : functional hardware description with dependent types

    Get PDF
    Describing correct circuits remains a tall order, despite four decades of evolution in Hardware Description Languages (HDLs). Many enticing circuit architectures require recursive structures or complex compile-time computation — two patterns that prove difficult to capture in traditional HDLs. In a signal processing context, the Fast FIR Algorithm (FFA) structure for efficient parallel filtering proves to be naturally recursive, and most Multiple Constant Multiplication (MCM) blocks decompose multiplications into graphs of simple shifts and adds using demanding compile time computation. Generalised versions of both remain mostly in academic folklore. The implementations which do exist are often ad hoc circuit generators, written in software languages. These pose challenges for verification and are resistant to composition. Embedded functional HDLs, that represent circuits as data, allow for these descriptions at the cost of forcing the designer to work at the gate-level. A promising alternative is to use a stand-alone compiler, representing circuits as plain functions, exemplified by the CλaSH HDL. This, however, raises new challenges in capturing a circuit’s staging — which expressions in the single language should be reduced during compile-time elaboration, and which should remain in the circuit’s run-time? To better reflect the physical separation between circuit phases, this work proposes a new functional HDL (representing circuits as functions) with first-class staging constructs. Orthogonal to this, there are also long-standing challenges in the verification of parameterised circuit families. Industry surveys have consistently reported that only a slim minority of FPGA projects reach production without non-trivial bugs. While a healthy growth in the adoption of automatic formal methods is also reported, the majority of testing remains dynamic — presenting difficulties for testing entire circuit families at once. This research offers an alternative verification methodology via the combination of dependent types and automatic synthesis of user-defined data types. Given precise enough types for synthesisable data, this environment can be used to develop circuit families with full functional verification in a correct-by-construction fashion. This approach allows for verification of entire circuit families (not just one concrete member) and side-steps the state-space explosion of model checking methods. Beyond the existing work, this research offers synthesis of combinatorial circuits — not just a software model of their behaviour. This additional step requires careful consideration of staging, erasure & irrelevance, deriving bit representations of user-defined data types, and a new synthesis scheme. This thesis contributes steps towards HDLs with sufficient expressivity for awkward, combinatorial signal processing structures, allowing for a correct-by-construction approach, and a prototype compiler for netlist synthesis.Describing correct circuits remains a tall order, despite four decades of evolution in Hardware Description Languages (HDLs). Many enticing circuit architectures require recursive structures or complex compile-time computation — two patterns that prove difficult to capture in traditional HDLs. In a signal processing context, the Fast FIR Algorithm (FFA) structure for efficient parallel filtering proves to be naturally recursive, and most Multiple Constant Multiplication (MCM) blocks decompose multiplications into graphs of simple shifts and adds using demanding compile time computation. Generalised versions of both remain mostly in academic folklore. The implementations which do exist are often ad hoc circuit generators, written in software languages. These pose challenges for verification and are resistant to composition. Embedded functional HDLs, that represent circuits as data, allow for these descriptions at the cost of forcing the designer to work at the gate-level. A promising alternative is to use a stand-alone compiler, representing circuits as plain functions, exemplified by the CλaSH HDL. This, however, raises new challenges in capturing a circuit’s staging — which expressions in the single language should be reduced during compile-time elaboration, and which should remain in the circuit’s run-time? To better reflect the physical separation between circuit phases, this work proposes a new functional HDL (representing circuits as functions) with first-class staging constructs. Orthogonal to this, there are also long-standing challenges in the verification of parameterised circuit families. Industry surveys have consistently reported that only a slim minority of FPGA projects reach production without non-trivial bugs. While a healthy growth in the adoption of automatic formal methods is also reported, the majority of testing remains dynamic — presenting difficulties for testing entire circuit families at once. This research offers an alternative verification methodology via the combination of dependent types and automatic synthesis of user-defined data types. Given precise enough types for synthesisable data, this environment can be used to develop circuit families with full functional verification in a correct-by-construction fashion. This approach allows for verification of entire circuit families (not just one concrete member) and side-steps the state-space explosion of model checking methods. Beyond the existing work, this research offers synthesis of combinatorial circuits — not just a software model of their behaviour. This additional step requires careful consideration of staging, erasure & irrelevance, deriving bit representations of user-defined data types, and a new synthesis scheme. This thesis contributes steps towards HDLs with sufficient expressivity for awkward, combinatorial signal processing structures, allowing for a correct-by-construction approach, and a prototype compiler for netlist synthesis

    Towards efficient exploitation of GPUs : a methodology for mapping index-digit algorithms

    Get PDF
    [Resumen]La computación de propósito general en GPUs supuso un gran paso, llevando la computación de alto rendimiento a los equipos domésticos. Lenguajes de programación de alto nivel como OpenCL y CUDA redujeron en gran medida la complejidad de programación. Sin embargo, para poder explotar totalmente el poder computacional de las GPUs, se requieren algoritmos paralelos especializados. La complejidad en la jerarquía de memoria y su arquitectura masivamente paralela hace que la programación de GPUs sea una tarea compleja incluso para programadores experimentados. Debido a la novedad, las librerías de propósito general son escasas y las versiones paralelas de los algoritmos no siempre están disponibles. En lugar de centrarnos en la paralelización de algoritmos concretos, en esta tesis proponemos una metodología general aplicable a la mayoría de los problemas de tipo divide y vencerás con una estructura de mariposa que puedan formularse a través de la representación Indice-Dígito. En primer lugar, se analizan los diferentes factores que afectan al rendimiento de la arquitectura de las GPUs. A continuación, estudiamos varias técnicas de optimización y diseñamos una serie de bloques constructivos modulares y reutilizables, que se emplean para crear los diferentes algoritmos. Por último, estudiamos el equilibrio óptimo de los recursos, y usando vectores de mapeo y operadores algebraicos ajustamos los algoritmos para las configuraciones deseadas. A pesar del enfoque centrado en la exibilidad y la facilidad de programación, las implementaciones resultantes ofrecen un rendimiento muy competitivo, que llega a superar conocidas librerías recientes.[Resumo] A computación de propósito xeral en GPUs supuxo un gran paso, levando a computación de alto rendemento aos equipos domésticos. Linguaxes de programación de alto nivel como OpenCL e CUDA reduciron en boa medida a complexidade da programación. Con todo, para poder aproveitar totalmente o poder computacional das GPUs, requírense algoritmos paralelos especializados. A complexidade na xerarquía de memoria e a súa arquitectura masivamente paralela fai que a programación de GPUs sexa unha tarefa complexa mesmo para programadores experimentados. Debido á novidade, as librarías de propósito xeral son escasas e as versións paralelas dos algoritmos non sempre están dispoñibles. En lugar de centrarnos na paralelización de algoritmos concretos, nesta tese propoñemos unha metodoloxía xeral aplicable á maioría dos problemas de tipo divide e vencerás cunha estrutura de bolboreta que poidan formularse a través da representación Índice-Díxito. En primeiro lugar, analízanse os diferentes factores que afectan ao rendemento da arquitectura das GPUs. A continuación, estudamos varias técnicas de optimización e deseñamos unha serie de bloques construtivos modulares e reutilizables, que se empregan para crear os diferentes algoritmos. Por último, estudamos o equilibrio óptimo dos recursos, e usando vectores de mapeo e operadores alxbricos axustamos os algoritmos para as configuracións desexadas. A pesar do enfoque centrado na exibilidade e a facilidade de programación, as implementacións resultantes ofrecen un rendemento moi competitivo, que chega a superar coñecidas librarías recentes.[Abstract]GPU computing supposed a major step forward, bringing high performance computing to commodity hardware. Feature-rich parallel languages like CUDA and OpenCL reduced the programming complexity. However, to fully take advantage of their computing power, specialized parallel algorithms are required. Moreover, the complex GPU memory hierarchy and highly threaded architecture makes programming a difficult task even for experienced programmers. Due to the novelty of GPU programming, common general purpose libraries are scarce and parallel versions of the algorithms are not always readily available. Instead of focusing in the parallelization of particular algorithms, in this thesis we propose a general methodology applicable to most divide-and-conquer problems with a buttery structure which can be formulated through the Index-Digit representation. First, we analyze the different performance factors of the GPU architecture. Next, we study several optimization techniques and design a series of modular and reusable building blocks, which will be used to create the different algorithms. Finally, we study the optimal resource balance, and through a mapping vector representation and operator algebra, we tune the algorithms for the desired configurations. Despite the focus on programmability and exibility, the resulting implementations offer very competitive performance, being able to surpass other well-known state of the art libraries
    corecore