397 research outputs found

    VLSI architectures for high speed Fourier transform processing

    DFT algorithms for bit-serial GaAs array processor architectures

    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology

    Efficient mapping of EEG algorithms

    Faster binary-field multiplication and faster binary-field MACs

    This paper shows how to securely authenticate messages using just 29 bit operations per authenticated bit, plus a constant overhead per message. The authenticator is a standard type of "universal" hash function providing information-theoretic security; what is new is computing this type of hash function at very high speed. At a lower level, this paper shows how to multiply two elements of a field of size 2^128 using just 9062 \approx 71 * 128 bit operations, and how to multiply two elements of a field of size 2^256 using just 22164 \approx 87 * 256 bit operations. This performance relies on a new representation of field elements and new FFT-based multiplication techniques. This paper's constant-time software uses just 1.89 Core 2 cycles per byte to authenticate very long messages. On a Sandy Bridge it takes 1.43 cycles per byte, without using Intel's PCLMULQDQ polynomial-multiplication hardware. This is much faster than the speed records for constant-time implementations of GHASH without PCLMULQDQ (over 10 cycles/byte), even faster than Intel's best Sandy Bridge implementation of GHASH with PCLMULQDQ (1.79 cycles/byte), and almost as fast as state-of-the-art 128-bit prime-field MACs using Intel's integer-multiplication hardware (around 1 cycle/byte). Keywords: Performance, FFTs, Polynomial multiplication, Universal hashing, Message authenticatio

    Fast multiplication of multiple-precision integers

    Multiple-precision multiplication algorithms are of fundamental interest for both theoretical and practical reasons. The conventional method requires 0(n2) bit operations whereas the fastest known multiplication algorithm is of order 0(n log n log log n). The price that has to be paid for the increase in speed is a much more sophisticated theory and programming code. This work presents an extensive study of the best known multiple-precision multiplication algorithms. Different algorithms are implemented in C, their performance is analyzed in detail and compared to each other. The break even points, which are essential for the selection of the fastest algorithm for a particular task, are determined for a given hardware software combination

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Faster polynomial multiplication over finite fields

    Let p be a prime, and let M_p(n) denote the bit complexity of multiplying two polynomials in F_p[X] of degree less than n. For n large compared to p, we establish the bound M_p(n) = O(n log n 8^(log^* n) log p), where log^* is the iterated logarithm. This is the first known F\"urer-type complexity bound for F_p[X], and improves on the previously best known bound M_p(n) = O(n log n log log n log p)

    New FFT/IFFT Factorizations with Regular Interconnection Pattern Stage-to-Stage Subblocks

    Les factoritzacions de la FFT (Fast Fourier Transform) que presenten un patró d’interconnexió regular entre factors o etapes son conegudes com algorismes paral·lels, o algorismes de Pease, ja que foren originalment proposats per Pease. En aquesta contribució s’han desenvolupat noves factoritzacions amb blocs que presenten el patró d’interconnexió regular de Pease. S’ha mostrat com aquests blocs poden ser obtinguts a una escala prèviament seleccionada. Les noves factoritzacions per ambdues FFT i IFFT (Inverse FFT) tenen dues classes de factors: uns pocs factors del tipus Cooley-Tukey i els nous factors que proporcionen la mateix patró d’interconnexió de Pease en blocs. Per a una factorització donada, els blocs comparteixen dimensions, el patró d’interconnexió etapa a etapa i a més cada un d’ells pot ser calculat independentment dels altres.FFT (Fast Fourier Transform) factorizations presenting a regular interconnection pattern between factors or stages are known as parallel algorithms, or Pease algorithms since were first proposed by Pease. In this paper, new FFT/IFFT (Inverse FFT) factorizations with blocks that exhibit regular Pease interconnection pattern are derived. It is shown these blocks can be obtained at a previously selected scale. The new factorizations for both the FFT and IFFT have two kinds of factors: a few Cooley-Tukey type factors and new factors providing the same Pease interconnection pattern property in blocks. For a given factorization, these blocks share dimensions, the interconnection pattern stage-to-stage, and all of them can be calculated independently from one another.Las factoritzaciones de la FFT (Fast Fourier Transform) que presentan un patrón de interconexiones regular entre factores o etapas son conocidas como algoritmos paralelos, o algoritmos de Pease, puesto que fueron originalmente propuestos por Pease. En esta contribución se han desarrollado nuevas factoritzaciones en subbloques que presentan el patrón de interconexión regular de Pease. Se ha mostrado como estos bloques pueden ser obtenidos a una escalera previamente seleccionada. Las nuevas factoritzaciones para ambas FFT y IFFT (Inverse FFT) tienen dos clases de factores: unos pocos factores del tipo Cooley-Tukey y los nuevos factores que proporcionan el mismo patrón de interconexión de Pease en bloques. Para una factoritzación dada, los bloques comparten dimensiones, patrón d’interconexión etapa a etapa y además cada uno de ellos puede ser calculado independientemente de los otros