617 research outputs found

    Audio Processing and Loudness Estimation Algorithms with iOS Simulations

    Get PDF
    abstract: The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.Dissertation/ThesisM.S. Electrical Engineering 201

    Perceptual aspects of voice-source parameters

    Get PDF
    xii+114hlm.;24c

    Implementing loudness models in Matlab

    Get PDF
    In the field of psychoacoustic analysis the goal is to construct a transformation that will map a time waveform into a domain that best captures the response of a human perceiving sound. A key element of such transformations is the mapping between the sound intensity in decibels and its actual perceived loudness. A number of different loudness models exist to achieve this mapping. This paper examines implementation strategies for some of the more wellknown models in the Matlab software environment

    Objective and Subjective Evaluation of Wideband Speech Quality

    Get PDF
    Traditional landline and cellular communications use a bandwidth of 300 - 3400 Hz for transmitting speech. This narrow bandwidth impacts quality, intelligibility and naturalness of transmitted speech. There is an impending change within the telecommunication industry towards using wider bandwidth speech, but the enlarged bandwidth also introduces a few challenges in speech processing. Echo and noise are two challenging issues in wideband telephony, due to increased perceptual sensitivity by users. Subjective and/or objective measurements of speech quality are important in benchmarking speech processing algorithms and evaluating the effect of parameters like noise, echo, and delay in wideband telephony. Subjective measures include ratings of speech quality by listeners, whereas objective measures compute a metric based on the reference and degraded speech samples. While subjective quality ratings are the gold - standard\u27\u27, they are also time- and resource- consuming. An objective metric that correlates highly with subjective data is attractive, as it can act as a substitute for subjective quality scores in gauging the performance of different algorithms and devices. This thesis reports results from a series of experiments on subjective and objective speech quality evaluation for wideband telephony applications. First, a custom wideband noise reduction database was created that contained speech samples corrupted by different background noises at different signal to noise ratios (SNRs) and processed by six different noise reduction algorithms. Comprehensive subjective evaluation of this database revealed an interaction between the algorithm performance, noise type and SNR. Several auditory-based objective metrics such as the Loudness Pattern Distortion (LPD) measure based on the Moore - Glasberg auditory model were evaluated in predicting the subjective scores. In addition, the performance of Bayesian Multivariate Regression Splines(BMLS) was also evaluated in terms of mapping the scores calculated by the objective metrics to the true quality scores. The combination of LPD and BMLS resulted in high correlation with the subjective scores and was used as a substitution for fine - tuning the noise reduction algorithms. Second, the effect of echo and delay on the wideband speech was evaluated in both listening and conversational context, through both subjective and objective measures. A database containing speech samples corrupted by echo with different delay and frequency response characteristics was created, and was later used to collect subjective quality ratings. The LPD - BMLS objective metric was then validated using the subjective scores. Third, to evaluate the effect of echo and delay in conversational context, a realtime simulator was developed. Pairs of subjects conversed over the simulated system and rated the quality of their conversations which were degraded by different amount of echo and delay. The quality scores were analysed and LPD+BMLS combination was found to be effective in predicting subjective impressions of quality for condition-averaged data

    Peripheral auditory processing and speech reception in impaired hearing

    Get PDF

    The effects of background noise and test subject on the perceived amount of bass in phase-modified harmonic complex tones

    Get PDF
    Äänenvärin havaitseminen liittyy läheisesti äänen tuottamiin suhteellisiin tasoihin simpu- kassa eri taajuuskaistoilla, joita kutsutaan kriittisiksi kaistoiksi. Äänen magnitudispektri määrittää sen taajuuskomponenttien suhteelliset voimakkuudet ja vaihespektri niiden suhteelliset vaiheet. Äänenväri siis riippuu usein pelkästään magnitudispektristä. Tutki- mustulokset ovat kuitenkin osoittaneet, että tietyn tyyppisten äänien äänenväriä voidaan muuttaa myös pelkästään vaihespektriä muuttamalla. Tämän lisäksi aiempi tutkimus on osoittanut, että muuttamalla harmonisen äänen vaihespektriä tietyllä tavalla havaittu bassokkuus muuttuu. Tällaiset äänet ovat siis ’vaiheherkkiä’. Kyseisessä tutkimuksessa käytettiin kahta tällaista vaihemuokattua ääntä, joista toisessa taajuuskomponenttien välillä oli -90 asteen ja toisessa 90 asteen vaihe- ero, ja perustaajuuskomponentti oli molemmissa kosinivaiheessa. Tutkimus osoitti, että suurin bassokkuusero havaitaan matalilla perustaajuuksilla ja se vastaa keskimäärin 2 – 4 dB:n vahvistusta magnitudispektrissä matalilla taajuuksilla. Tämä ilmiön suuruus riippui kuitenkin huomattavasti testihenkilöstä. Lisäksi huomattiin, että bassokkuuserot ovat helpompia kuulla taustakohinan kanssa. Tämän työn tavoitteena oli tutkia edelleen taustakohinan merkitystä ja yksilöllisiä eroja tällaisten vaiheherkkien äänien bassokkuuden havaitsemisessa. Kaksi formaalia kuuntelu- koetta järjestettiin käyttäen kuulokkeita. Ensiksi tutkittiin taustakohinan vaikutusta kyseisten äänien bassokkuuserojen kuulemiseen olettaen, että nämä erot ovat kuultavissa äänekkyyseroina. Tulokset viittaavat, että taustakohinan tason nousun vaikutus testiäänien äänekkyyseroon ei ole tilastollisesti merkittävä, mutta on lähellä merkittävyyden rajaa ja trendi on nähtävissä äänekkyyseron kasvulle. Lisäksi nähdään, että kyseisten vaiheherkkien äänien yleinen äänekkyys laskee kun taustakohinan tasoa voimistetaan. Toiseksi tutkittiin sitä, minkä vaihespektrin omaavan äänen eri ihmiset kuulevat bassokkaimpana. Tulokset osoittavat, että testihenkilöt eroavat siinä, minkä vaihespektrin omaavan äänen he kuulevat bassokkaimpana, ja että tämä ero on tilastollisesti merkittävä.The perception of timbre is closely related to the relative levels produced by a sound in each frequency band, called ‘critical band’, in the cochlea. The magnitude spectrum defines the relative levels and phase spectrum the relative phases of the frequency components in a complex sound. Thus, the timbre of sound depends often only on the magnitude spectrum. However, several studies have shown that the timbre of certain complex sounds can be affected by modifying only the phase spectrum. Moreover, a recent study has shown that with certain modifications of only the phase spectrum of a ‘phase-sensitive’ harmonic complex tone, the perceived level of bass changes. That experiment was conducted using two synthetic harmonic complex tones in which adjacent frequency components have a phase-shift of -90◦ and 90◦, respectively, and the fundamental component is in cosine-phase. The greatest difference in perceived level of bass was found at the fundamental frequency of 50 Hz and it corresponds to a 2 – 4-dB amplification of the magnitude spectrum at low frequencies. However, this effect was reported to vary substantially between individuals. Moreover, the differences were found to be easier to detect in the presence of background noise. The aim of this thesis was to investigate further the roles of background noise and the individual in the perceived level of bass in the phase-sensitive tones. Two formal listening tests were conducted accordingly using headphones. Firstly, the effect of background noise on the discrimination of the phase-sensitive tones based on the perceived level of bass was studied. The effect of increasing background noise level on the perceived loudness difference was found not to be statistically significant, but a trend could be seen towards increasing loudness difference. Additionally, the results indicate that the overall perceived loudness of the test tones decreases with increasing level of background noise. Secondly, an experiment was conducted to find the preferred value of the constant phase shift between adjacent components that produces a tone with the perceptually loudest bass for different individuals. The results show that individuals hear the phase spectrum required to produce the perception of the loudest bass statistically significantly differently from each other

    Calculation of Unsteady Loudness in the Presence of Gaps Through Application of the Multiple Look Theory

    Get PDF
    Experimental studies have shown that for short gaps between 2 to 5 ms, the perceived loudness is higher than for uninterrupted noise presented to the ear. Other studies have also shown that the present temporal integration models for the calculation of time varying loudness do not adequately account for short duration phenomena. It has been proposed that the multiple look approach is a more applicable method for describing these short term circumstances. This approach breaks a sound into small durations or looks having length of 1 ms which allows for the intelligent processing of the looks and decision making depending on the nature of the stimulus. However, present technologies (i.e. FFT) are not adequate to deal with short duration sounds across the entire frequency spectra. A compromised approach is taken here to account for perceived loudness levels for sounds in the presence of gaps while using an integration model. This approach is referred to as a multiple look gap adjustment model. A model and software code was developed to take a recorded sound presented to the ear and process it into individual looks which are then examined for the presence of gaps ranging in length between 1 to 10 ms. If gaps are found, an appropriate gap adjustment is applied to the sound. The modified stimulus is subsequently evaluated for loudness level using a model which relies on temporal integration. The multiple look model was tested using several sounds including mechanical and speech sounds and was found to perform as intended. While recommendations for improvement and further study are included, the application of the model has shown particular merit for perceptional analysis of sounds involving speech

    Tracking cortical entrainment in neural activity: auditory processes in human temporal cortex.

    Get PDF
    A primary objective for cognitive neuroscience is to identify how features of the sensory environment are encoded in neural activity. Current auditory models of loudness perception can be used to make detailed predictions about the neural activity of the cortex as an individual listens to speech. We used two such models (loudness-sones and loudness-phons), varying in their psychophysiological realism, to predict the instantaneous loudness contours produced by 480 isolated words. These two sets of 480 contours were used to search for electrophysiological evidence of loudness processing in whole-brain recordings of electro- and magneto-encephalographic (EMEG) activity, recorded while subjects listened to the words. The technique identified a bilateral sequence of loudness processes, predicted by the more realistic loudness-sones model, that begin in auditory cortex at ~80 ms and subsequently reappear, tracking progressively down the superior temporal sulcus (STS) at lags from 230 to 330 ms. The technique was then extended to search for regions sensitive to the fundamental frequency (F0) of the voiced parts of the speech. It identified a bilateral F0 process in auditory cortex at a lag of ~90 ms, which was not followed by activity in STS. The results suggest that loudness information is being used to guide the analysis of the speech stream as it proceeds beyond auditory cortex down STS toward the temporal pole.This work was supported by an EPSRC grant to William D. Marslen-Wilson and Paula Buttery (EP/F030061/1), an ERC Advanced Grant (Neurolex) to William D. Marslen-Wilson, and by MRC Cognition and Brain Sciences Unit (CBU) funding to William D. Marslen-Wilson (U.1055.04.002.00001.01). Computing resources were provided by the MRC-CBU and the University of Cambridge High Performance Computing Service (http://www.hpc.cam.ac.uk/). Andrew Liu and Phil Woodland helped with the HTK speech recogniser and Russell Thompson with the Matlab code. We thank Asaf Bachrach, Cai Wingfield, Isma Zulfiqar, Alex Woolgar, Jonathan Peelle, Li Su, Caroline Whiting, Olaf Hauk, Matt Davis, Niko Kriegeskorte, Paul Wright, Lorraine Tyler, Rhodri Cusack, Brian Moore, Brian Glasberg, Rik Henson, Howard Bowman, Hideki Kawahara, and Matti Stenroos for invaluable support and suggestions.This is the final published version. The article was originally published in Frontiers in Computational Neuroscience, 10 February 2015 | doi: 10.3389/fncom.2015.0000
    corecore