98 research outputs found

    Reconfigurable Multiband Dynamic Range Compression-based FRM Filter for Hearing Aid

    Get PDF
    In this research, we present an innovative method for enhancing the performance of hearing aids using a Multiband Dynamic Range Compression-based Reconfigurable Frequency Response Masking (FRM) Filterbank. First, a unform16-band reconfigurable filter bank, which is reconfigurable, is designed utilizing the FRM scheme. The strategic arrangement of each sub-band within the proposed filter bank is meticulously prepared to optimize the matching performance. Based on the hearing characteristics of patients, the sub-bands can be distributed in low, medium, and high-frequency regions. Also, the gain can be adjusted per the patient's hearing profile from their audiogram for better auditory compensation. Further, the Multiband Dynamic Range Compression (MBDRC) technique is applied to address the specific needs of individuals with different frequency-dependent hearing impairments. It involves using dynamic range compression independently to different frequency sub-bands within a filter bank. In MBDRC, the compression parameters, such as compression threshold and ratio, can be adjusted independently for every subband. It allows for a more tailored approach to address the specific hearing needs of different frequency regions. If an individual has more severe hearing loss in high-frequency regions, higher compression ratios and lower compression thresholds can be applied to those subbands to amplify and improve audibility for high-frequency sounds. Once dynamic range compression is applied to each sub-band, the resultant sub-bands are reassembled to yield the ultimate output signal, which can subsequently be transmitted to the speaker or receiver of the hearing aid. A GUI can be helpful for better visualization and parameter control, including gain adjustment and compression parameters of this entire process. With this aim in mind, a GUI has been developed on MATLAB. Different audio files can be imported, and their frequency response can be generated and observed. Based on a person's audiogram, the control parameters can be set to low, medium, or high. Their sub-band distribution in low, medium, and high-frequency regions can be visualized. Further, the filter bank makes automatic gain adjustments, as seen in the GUI. The gain points for each band can also be manually adjusted according to users' hearing characteristics to minimize the error. Also, the compression parameters can be set separately for each subband as per the hearing requirement of the patient. Further, the processed output can be visualized in the output frequency response tab, and the input and output audio signals can be analyzed

    The use of spectral information in the development of novel techniques for speech-based cognitive load classification

    Full text link
    The cognitive load of a user refers to the amount of mental demand imposed on the user when performing a particular task. Estimating the cognitive load (CL) level of the users is necessary to adjust the workload imposed on them accordingly in order to improve task performance. The current speech based CL classification systems are not adequate for commercial use due to their low performance particularly in noisy environments. This thesis proposes many techniques to improve the performance of the speech based cognitive load classification system in both clean and noisy conditions. This thesis analyses and presents the effectiveness of speech features such as spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) for CL classification. Sub-systems based on SCF and SCA features were developed and fused with the traditional Mel frequency cepstral coefficients (MFCC) based system, producing an 8.9% and 31.5% relative error rate reduction respectively when compared to the MFCC-based system alone. The Stroop test corpus was used in these experiments. The investigation into cognitive load information in the form of spectral distribution in different subbands shows that the information distributed in the low frequency subband is significantly higher than the high frequency subband. Two different methods are proposed to utilize this finding. The first method, called the multi-band approach, uses a weighting scheme to emphasize the speech features in low frequency subbands. The cognitive load classification accuracy of this approach is shown to be higher than a system based on a non-weighting scheme. The second method is to design an effective filterbank based on the spectral distribution of cognitive load information using the Kullback-Leibler distance measure. It is shown that the designed filterbank consistently provides higher classification accuracies than other existing filterbanks such as mel, Bark, and equivalent rectangular bandwidth. A discrete cosine transform based speech enhancement technique is proposed in order to increase the robustness of the CL classification system and found to be more suitable than other methods investigated. This proposed method provides a 3.0% average relative error rate reduction for the seven types of noise and five levels of SNR used. In particular, it provides a maximum of 7.5% relative error rate reduction for the F16 noise (in NOISEX-92 database) at 20 dB SNR

    Wavelet theory and applications:a literature study

    Get PDF

    A software based, 13 kbits/s real-time internet codec

    Get PDF
    Due to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to [email protected], referencing the URI of the item.Includes bibliographical references: p. 50-53.Issued also on microfiche from Lange Micrographics.Bandwidth usage is a prime concern to many on the Internet, especially for users on low bit rate channels. As video conferencing becomes more popular, the need for efficient software based compression of video and audio becomes more important. This work develops a scalable, real-time, software based speech codec for use on desktop computers. The system is based on subband coding, adaptive prediction, and Huffman coding, and is capable of bit rates below 13 kbits/s for communications quality audio. The quality may be 'scaled" up by allocating additional bits to the subbands. This coder has been successfully implemented in real-time on a Sun Sparc 10 platform

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Biorthogonality in lapped transforms : a study in high-quality audio compression

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 76-82).by Shiufun Cheung.Ph.D

    Survey of error concealment schemes for real-time audio transmission systems

    Get PDF
    This thesis presents an overview of the main strategies employed for error detection and error concealment in different real-time transmission systems for digital audio. The “Adaptive Differential Pulse-Code Modulation (ADPCM)”, the “Audio Processing Technology Apt-x100”, the “Extended Adaptive Multi-Rate Wideband (AMR-WB+)”, the “Advanced Audio Coding (AAC)”, the “MPEG-1 Audio Layer II (MP2)”, the “MPEG-1 Audio Layer III (MP3)” and finally the “Adaptive Transform Coder 3 (AC3)” are considered. As an example of error management, a simulation of the AMR-WB+ codec is included. The simulation allows an evaluation of the mechanisms included in the codec definition and enables also an evaluation of the different bit error sensitivities of the encoded audio payload.Ingeniería Técnica en Telemátic

    Estimation and Modeling Problems in Parametric Audio Coding

    Get PDF
    corecore