Search CORE

1,091 research outputs found

AUDIO PROCESSING ANALYZER

Author: rizwan Sana Rizwan
Publication venue: International Scientific Research and Researchers Association (ISRRA)
Publication date: 14/12/2020
Field of study

The project emphasizes simulation of various DSP effects using elementary phenomenon of audio processing, and by manipulating audio using various filters in order to enhance the quality. There are many commercially available systems, which provide facilities such as channel equalizers, karaoke systems, and a few audio processors based on Digital Signal Processing. Software systems are also available which provide a fairly good and cost effective solution to audio enhancement. Yet they are limited due to resources issues and thus make a trade-off between performance and quality. The project at first studies and analyses proceeds as study and analysis of audio processing phenomena and various effects involved in it. In the second phase algorithms have been developed for these phenomena and their simulation in MATLAB.

GSSRR.ORG: International Journals: Publishing Research Papers in all Fields

System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

Author: L'INSALATA NICOLA EUGENIO
Publication venue: 'Pisa University Press'
Publication date: 09/06/2048
Field of study

This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

Electronic Thesis and Dissertation Archive - Università di Pisa

REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

Author: Suarez Hernan
Publication venue
Publication date: 01/12/2015
Field of study

New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

SHAREOK repository

Using System-on-a-Programmable-Chip Technology to Design Embedded Systems

Author: Hall Tyson S.
Hamblen James O.
Publication venue: 'IEREK Research Enrichment and Knowledge Exchange'
Publication date: 01/09/2006
Field of study

This paper describes the tools, techniques, and devices used to design embedded products with system–on-a-chip (SoC) type solutions using a large Field Programmable Gate Array (FPGA) with an internal processor core. This new FPGA-based approach is called system-on-a-programmable-chip (SoPC ). The performance tradeoffs present in SoPC systems is compared to more traditional design approaches. Commercial devices, processor cores, and CAD tool flows are described. The issues in SoPC hardware/software design tradeoffs are examined and three example SoPC designs are presented as case studies

Southern Adventist University

Recommended from our members

Complexity-reduced hardware-based track-trigger for CMS upgrade

Author: Ghorbani Maziar
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe Compact Muon Solenoid (CMS) detector at the Large Hadron Collider (LHC) is designed to study the results of proton-proton collisions. The Tracker sub-detector is designed to detect and reconstruct the trajectories of charged particles produced by the collisions. During the lifetime of the CMS detector, there have been several upgrades aimed at increasing the chance of discovering new physics through increased luminosity levels and instrumentation of advanced technology. The High-Luminosity upgrade optimises the LHC to accelerate high-energy particles with an average of 200 proton-proton interactions per bunch crossing. The Level-1 Trigger system promptly analyses and filters collisions using hardware to reduce the data volume in real-time. For the upgrade, the trigger mechanism will use a particle trajectory estimator that discriminates between particles based on their transverse momentum (pT ). Particles with pT ≥ 2 GeV/c will be transmitted to the Level-1 Track-Trigger system for trajectory reconstruction within a fixed 3 μs latency. This thesis presents a novel Hardware-based Multivariate Linear Fitter (MVLF) system focusing on robustness in tracking efficiency and reduction in logic resource usage within the specified latency. The system components are implemented in Field Programmable Gate Arrays (FPGA), targeting 16 nm FinFET UltraScale+ silicon technology. The development was performed using the High-Level Synthesis (HLS) automation tools and the Hardware acceleration platform for Application-Specific Integrated Circuits (ASIC). A firmware demonstrator has been assembled to verify the feasibility and compatibility of the scaled system with the CMS Level-1 Track-Trigger infrastructure. The system’s performance is compared to past and current system developments, and the results are presented accordingly

Brunel University Research Archive

FPGA Implementation of Higher Order FIR Filter

Author: R. Prakash Neelam
Singh Gurpadam
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2017
Field of study

The digital Finite-Impulse-Response (FIR) filters are mainly employed in digital signal processing applications. The main components of digital FIR filters designed on FPGAs are the register bank to save the samples of signals, adder to implement sum operations and multiplier for multiplication of filter coefficients to signal samples. Although, design and implementation of digital FIR filters seem simple but the design bottleneck is multiplier block for speed, power consumption and FPGA chip area occupation. The multipliers are an integral part in FIR structures and these use a large part of the chip area. This limits the number of processing elements (PE) available on the chip to realize a higher order of filter. A model is developed in the Matlab/Simulink environment to investigate the performance of the desired higher order FIR filter. An equivalent FIR filter representation is designed by the Xilinx FIR Compiler by using the exported FIR filter coefficients. The Xilinx implementation flow is completed with the help of Xilinx ISE 14.5. It is observed how the use of higher order FIR filter impacts the resource utilization of the FPGA and it’s the maximum operating frequency

Crossref

ZENODO

Institute of Advanced Engineering and Science

High sample-rate Givens rotations for recursive least squares

Author: Walke Richard Lewis
Publication venue
Publication date
Field of study

The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

Warwick Research Archives Portal Repository

Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics

Author: Carlson Benjamin
Eubanks Brandon
Hong Tae Min
Racz Stephen
Roche Stephen
Stelzer Joerg
Stumpp Daniel
Publication venue: 'IOP Publishing'
Publication date: 17/08/2021
Field of study

We present a novel implementation of classification using the machine learning / artificial intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA). The firmware implementation of binary classification requiring 100 training trees with a maximum depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock speed from 100 to 320 MHz in our setup. The low timing values are achieved by restructuring the BDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at a range from 0.01% to 0.2% in our setup. A software package called fwXmachina achieves this implementation. Our intended user is an expert of custom electronics-based trigger systems in high energy physics experiments or anyone that needs decisions at the lowest latency values for real-time event classification. Two problems from high energy physics are considered, in the separation of electrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the rejection of the multijet processes.Comment: 66 pages, 27 figures, 13 tables, JINST versio

arXiv.org e-Print Archive