Search CORE

73 research outputs found

DEVELOPMENT AND EVALUATION OF ENVELOPE, SPECTRAL AND TIME ENHANCEMENT ALGORITHMS FOR AUDITORY NEUROPATHY

Author: Morgan Benjamin Robert
Publication venue: Scholarship@Western
Publication date: 01/01/2011
Field of study

Auditory neuropathy (AN) is a hearing disorder that reduces the ability to detect temporal cues in speech, thus leading to deprived speech perception. Traditional amplification and frequency shifting techniques used in modern hearing aids are not suitable to assist individuals with AN due to the unique symptoms that result from the disorder. This study proposes a method for combining both speech envelope enhancement and time scaling to combine the proven benefits of each algorithm. In addition, spectral enhancement is cascaded with envelope and time enhancement to address the poor frequency discrimination in AN. The proposed speech enhancement strategy was evaluated using an AN simulator with normal hearing listeners under varying degrees of AN severity. The results showed a significant increase in word recognition scores for time scaling and envelope enhancement over envelope enhancement alone. Furthermore, the addition of spectral enhancement resulted in further increase in word recognition at profound AN severity

Scholarship@Western

Non-intrusive identification of speech codecs in digital audio signals

Author: Jenner Frank
Publication venue: RIT Scholar Works
Publication date: 01/11/2011
Field of study

Speech compression has become an integral component in all modern telecommunications networks. Numerous codecs have been developed and deployed for efficiently transmitting voice signals while maintaining high perceptual quality. Because of the diversity of speech codecs used by different carriers and networks, the ability to distinguish between different codecs lends itself to a wide variety of practical applications, including determining call provenance, enhancing network diagnostic metrics, and improving automated speaker recognition. However, few research efforts have attempted to provide a methodology for identifying amongst speech codecs in an audio signal. In this research, we demonstrate a novel approach for accurately determining the presence of several contemporary speech codecs in a non-intrusive manner. The methodology developed in this research demonstrates techniques for analyzing an audio signal such that the subtle noise components introduced by the codec processing are accentuated while most of the original speech content is eliminated. Using these techniques, an audio signal may be profiled to gather a set of values that effectively characterize the codec present in the signal. This procedure is first applied to a large data set of audio signals from known codecs to develop a set of trained profiles. Thereafter, signals from unknown codecs may be similarly profiled, and the profiles compared to each of the known training profiles in order to decide which codec is the best match with the unknown signal. Overall, the proposed strategy generates extremely favorable results, with codecs being identified correctly in nearly 95% of all test signals. In addition, the profiling process is shown to require a very short analysis length of less than 4 seconds of audio to achieve these results. Both the identification rate and the small analysis window represent dramatic improvements over previous efforts in speech codec identification

RIT Scholar Works

Variable bit rate voice over ATM using compression and silence removal

Author: Yearwood Mario A. (Mario Anton)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 45-46).by Mario A. Yearwood.M.Eng

DSpace@MIT

DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition

Author: Chen C.
Chng E. S.
Guo Z.
Publication venue
Publication date: 01/08/2022
Field of study

The performances of automatic speech recognition (ASR) systems degrade drastically under noisy conditions. Explicit distortion modelling (EDM), as a feature compensation step, is able to enhance ASR systems under such conditions by simulating the in-domain noisy speeches from the clean counterparts. Yet, existing distortion models are either non-trainable or unexplainable and often lack controllability and generalization ability. In this paper, we propose a fully explainable and controllable model: DENT-DDSP to achieve EDM. DENT-DDSP utilizes novel differentiable digital signal processing (DDSP) components and requires only 10 seconds of training data to achieve high fidelity. The experiment shows that the simulated noisy data from DENT-DDSP achieves the highest simulation fidelity compared to other baseline models in terms of multi-scale spectral loss (MSSL). Moreover, to validate whether the data simulated by DENT-DDSP are able to replace the scarce in-domain noisy data in the noise-robust ASR tasks, several downstream ASR models with the same architecture are trained using the simulated data and the real data. The experiment shows that the model trained with the simulated noisy data from DENT-DDSP achieves similar performances to the benchmark with a 2.7\% difference in terms of word error rate (WER). The code of the model is released online

arXiv.org e-Print Archive

Implementation and testing of an improved echo canceller and an ADPCM speech codec

Author: Besselink R.J.C.
Publication venue
Publication date: 31/08/1998
Field of study

Pure OAI Repository

Recommended from our members

Continuous-Time and Companding Digital Signal Processors Using Adaptivity and Asynchronous Techniques

Author: Vezyrtzis Christos
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

The fully synchronous approach has been the norm for digital signal processors (DSPs) for many decades. Due to its simplicity, the classical DSP structure has been used in many applications. However, due to its rigid discrete-time operation, a classical DSP has limited efficiency or inadequate resolution for some emerging applications, such as processing of multimedia and biological signals. This thesis proposes fundamentally new approaches to designing DSPs, which are different from the classical scheme. The defining characteristic of all new DSPs examined in this thesis is the notion of "adaptivity" or "adaptability." Adaptive DSPs dynamically change their behavior to adjust to some property of their input stream, for example the rate of change of the input. This thesis presents both enhancements to existing adaptive DSPs, as well as new adaptive DSPs. The main class of DSPs that are examined throughout the thesis are continuous-time (CT) DSPs. CT DSPs are clock-less and event-driven; they naturally adapt their activity and power consumption to the rate of their inputs. The absence of a clock also provides a complete avoidance of aliasing in the frequency domain, hence improved signal fidelity. The core of this thesis deals with the complete and systematic design of a truly general-purpose CT DSP. A scalable design methodology for CT DSPs is presented. This leads to the main contribution of this thesis, namely a new CT DSP chip. This chip is the first general-purpose CT DSP chip, able to process many different classes of CT and synchronous signals. The chip has the property of handling various types of signals, i.e. various different digital modulations, both synchronous and asynchronous, without requiring any reconfiguration; such property is presented for the first time CT DSPs and is impossible for classical DSPs. As opposed to previous CT DSPs, which were limited to using only one type of digital format, and whose design was hard to scale for different bandwidths and bit-widths, this chip has a formal, robust and scalable design, due to the systematic usage of asynchronous design techniques. The second contribution of this thesis is a complete methodology to design adaptive delay lines. In particular, it is shown how to make the granularity, i.e. the number of stages, adaptive in a real-time delay line. Adaptive granularity brings about a significant improvement in the line's power consumption, up to 70% as reported by simulations on two design examples. This enhancement can have a direct large power impact on any CT DSP, since a delay line consumes the majority of a CT DSP's power. The robust methodology presented in this thesis allows safe dynamic reconfiguration of the line's granularity, on-the-fly and according to the input traffic. As a final contribution, the thesis also examines two additional DSPs: one operating the CT domain and one using the companding technique. The former operates only on level-crossing samples; the proposed methodology shows a potential for high-quality outputs by using a complex interpolation function. Finally, a companding DSP is presented for MPEG audio. Companding DSPs adapt their dynamic range to the amplitude of their input; the resulting can offer high-quality outputs even for small inputs. By applying companding to MPEG DSPs, it is shown how the DSP distortion can be made almost inaudible, without requiring complex arithmetic hardware

Columbia University Academic Commons

Comparison of CELP speech coder with a wavelet method

Author: Nagaswamy Sriram
Publication venue: UKnowledge
Publication date: 01/01/2006
Field of study

This thesis compares the speech quality of Code Excited Linear Predictor (CELP, Federal Standard 1016) speech coder with a new wavelet method to compress speech. The performances of both are compared by performing subjective listening tests. The test signals used are clean signals (i.e. with no background noise), speech signals with room noise and speech signals with artificial noise added. Results indicate that for clean signals and signals with predominantly voiced components the CELP standard performs better than the wavelet method but for signals with room noise the wavelet method performs much better than the CELP. For signals with artificial noise added, the results are mixed depending on the level of artificial noise added with CELP performing better for low level noise added signals and the wavelet method performing better for higher noise levels

University of Kentucky

Design of a smartphone with a Digital Signal Processor

Author: Lecluse Joep
Publication venue
Publication date: 01/01/1996
Field of study

Repository TU/e

Pure OAI Repository

AUDIO PROCESSING ANALYZER

Author: rizwan Sana Rizwan
Publication venue: International Scientific Research and Researchers Association (ISRRA)
Publication date: 14/12/2020
Field of study

The project emphasizes simulation of various DSP effects using elementary phenomenon of audio processing, and by manipulating audio using various filters in order to enhance the quality. There are many commercially available systems, which provide facilities such as channel equalizers, karaoke systems, and a few audio processors based on Digital Signal Processing. Software systems are also available which provide a fairly good and cost effective solution to audio enhancement. Yet they are limited due to resources issues and thus make a trade-off between performance and quality. The project at first studies and analyses proceeds as study and analysis of audio processing phenomena and various effects involved in it. In the second phase algorithms have been developed for these phenomena and their simulation in MATLAB.

GSSRR.ORG: International Journals: Publishing Research Papers in all Fields