492 research outputs found

    FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition

    Get PDF
    The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design

    FPGA-based architectures for acoustic beamforming with microphone arrays : trends, challenges and research opportunities

    Get PDF
    Over the past decades, many systems composed of arrays of microphones have been developed to satisfy the quality demanded by acoustic applications. Such microphone arrays are sound acquisition systems composed of multiple microphones used to sample the sound field with spatial diversity. The relatively recent adoption of Field-Programmable Gate Arrays (FPGAs) to manage the audio data samples and to perform the signal processing operations such as filtering or beamforming has lead to customizable architectures able to satisfy the most demanding computational, power or performance acoustic applications. The presented work provides an overview of the current FPGA-based architectures and how FPGAs are exploited for different acoustic applications. Current trends on the use of this technology, pending challenges and open research opportunities on the use of FPGAs for acoustic applications using microphone arrays are presented and discussed

    Adaptive Audio Classification Framework for in-Vehicle Environment with Dynamic Noise Characteristics

    Get PDF
    With ever-increasing number of car-mounted electric devices that are accessed, managed, and controlled with smartphones, car apps are becoming an important part of the automotive industry. Audio classification is one of the key components of car apps as a front-end technology to enable human-app interactions. Existing approaches for audio classification, however, fall short as the unique and time-varying audio characteristics of car environments are not appropriately taken into account. Leveraging recent advances in mobile sensing technology that allows for an active and accurate driving environment detection, in this thesis, we develop an audio classification framework for mobile apps that categorizes an audio stream into music, speech, speech and music, and noise, adaptability depending on different driving environments. A case study is performed with four different driving environments, i.e., highway, local road, crowded city, and stopped vehicle. More than 420 minutes of audio data are collected including various genres of music, speech, speech and music, and noise from the driving environments

    Non-linear echo cancellation - a Bayesian approach

    Get PDF
    Echo cancellation literature is reviewed, then a Bayesian model is introduced and it is shown how how it can be used to model and fit nonlinear channels. An algorithm for cancellation of echo over a nonlinear channel is developed and tested. It is shown that this nonlinear algorithm converges for both linear and nonlinear channels and is superior to linear echo cancellation for canceling an echo through a nonlinear echo-path channel

    In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

    Full text link
    The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small true wireless earbuds, by applying noise suppression techniques. Such processing relies on voice activity detection (VAD) with low latency and the added capability of discriminating the wearer's voice from others - a task of significant computational complexity. The tight energy budget of devices as small as modern earphones, however, requires any system attempting to tackle this problem to do so with minimal power and processing overhead, while not relying on speaker-specific voice samples and training due to usability concerns. This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications. Furthermore, the paper accurately evaluates a proposed low-power personalized speech detection algorithm based on bone conduction data and a recurrent neural network running on the implemented research platform. This algorithm is compared to an approach based on traditional microphone input. The performance of the bone conduction system, achieving detection of speech within 12.8ms at an accuracy of 95\% is evaluated. Different SoC choices are contrasted, with the final implementation based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and without duty cycling

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    Interfacing PDM MEMS Microphones with PFM Spiking Systems: Application for Neuromorphic Auditory Sensors

    Get PDF
    Neuromorphic computation processes sensors output in the spiking domain, which presents constraints in many cases when converting information to spikes, loosing, as example, temporal accuracy. This paper presents a spike-based system to adapt audio information from low-power pulse-density modulation (PDM) microelectromechanical systems microphones into rate coded spike frequencies. These spikes could be directly used by the neuromorphic auditory sensor (NAS) for frequency decomposition in different bands, avoiding the analog or digital conversion to spike streams. This improves the time response of the NAS, allowing its use in more time restrictive applications. This adaptation was conducted in VHDL as an interface for PDM microphones, converting their pulses into temporal distributed spikes following a pulse-frequency modulation scheme with an accurate inter-spike-interval, known as PDM to spikes interface (PSI). We introduce a new architecture of spike-based band-pass filter to reject DC components and distribute spikes in time. This was tested in two scenarios, first as a stand-alone circuit for its characterization, and then integrated with a NAS for verification. The PSI achieves a total harmonic distortion of −46.18 dB and a signal-to-noise ratio of 63.47 dB, demands less than 1% of the resources of a Spartan-6 FPGA and its power consumption is around 7 mW.Agencia Estatal de Investigación PID2019-105556GB-C33/AEI/10.13039/501100011033 (MINDROB)Ministerio de Ciencia, Innovación y Universidades PCI2019-111841-2 (CHIST-ERA SMALL

    Hardware Acceleration of the Robust Header Compression (RoHC) Algorithm

    Get PDF
    With the proliferation of Long Term Evolution (LTE) networks, many cellular carriers are embracing the emerging eld of mobile Voice over Internet Protocol (VoIP). The robust header compression (RoHC) framework was introduced as a part of the LTE Layer 2 stack to compress the large headers of the VoIP packets before transmitted over LTE IP-based architectures. The headers, which are encapsulated Real-time Transport Protocol (RTP)/User Datagram Protocol (UDP)/Internet Protocol (IP) stack, are large compared to the small payload. This header-compression scheme is especially useful for ecient utilization of the radio bandwidth and network resources. In an LTE base-station implementation, RoHC is a processing-intensive algorithm that may be the bottleneck of the system, and thus, may be the limiting factor when it comes to number of users served. In this thesis, a hardware-software and a full-hardware solution are proposed, targeting LTE base-stations to accelerate this computationally intensive algorithm and enhance the throughput and the capacity of the system. The results of both solutions are discussed and compared with respect to design metrics like throughput, capacity, power consumption, chip area and exibility. This comparison is instrumental in taking architectural level trade-o decisions in-order to meet the present day requirements and also be ready to support future evolution. In terms of throughput, a gain of 20% (6250 packets/sec can be processed at a frequency of 150 MHz) is achieved in the HW-SW solution compared to the SW-Only solution by implementing the Cyclic Redundancy Check (CRC) and the Least Signicant Bit(LSB) encoding blocks as hardware accelerators . Whereas, a Full-HW implementation leads to a throughput of 45 times (244000 packets/sec can be processed at a frequency of 100 MHz) the throughput of the SW-Only solution. However, the full-HW solution consumes more Lookup Tables (LUTs) when it is synthesized on an Field-Programmable Gate Array (FPGA) platform compared to the HW-SW solution. In Arria II GX, the HW-SW and the full-HW solutions use 2578 and 7477 LUTs and consume 1.5 and 0.9 Watts, respectively. Finally, both solutions are synthesized and veried on Altera's Arria II GX FPGA
    corecore