47 research outputs found
Over-Sampling for Accurate Masking Threshold Calculation in Wavelet Packet Audio Coders
Many existing audio coders use a critically sampled discrete wavelet transform (DWT) for the decomposition of audio signals. While the aliasing present in the wavelet coefficients is cancelled in the decoder, these coders normally perform calculation of the simultaneous masking threshold directly on these aliased coefficients. This paper uses over-sampling in the wavelet packet decomposition in order to provide alias-free coefficients for accurate simultaneous masking threshold calculation. The proposed technique is compared with masking threshold calculation based upon the FFT and critically-sampled wavelet coefficients, and the results show that a bit rate saving of up to 16 kbit/s can be achieved using over-sampling
Analysis of orientation error of triaxial accelerometers on the assessment of energy expenditure
This paper investigates the effects of orientation error in the positioning of triaxial accelerometers on the assessment of energy expenditure. Four subjects walked on a treadmill at varying velocities ranging from 4km.h -1 to 5km.h-1. During each test, a triaxial accelerometer attached to the lower back at arbitrary orientations to record body accelerations. Energy expenditure was estimated by the sum of the integrals of the absolute value of accelerometer output from all the three measurement directions. Based on theoretical analysis and experimental observations, it is concluded that small orientation errors ( < 3° ) have no distinguishable effects on the estimation of energy expenditure. We propose an efficient method to compensate for larger orientation errors. The experimental results verified the effectiveness of this proposed compensation method. ©2005 IEEE
Estimation of walking energy expenditure by using support vector regression
This paper develops a new predictor of walking energy expenditure from wireless measurements of body movements using triaxial accelerometers. Reliable data were collected from repeated walking experiments in different conditions on a treadmill with simultaneous measurement of expired oxygen and carbon dioxide. Support vector regression, a powerful non-linear regression method, was used to process and model the data. This novel processing method sets this investigation apart from existing papers. Good results were achieved in the robust estimation of walking related energy expenditure from a number of variables derived from triaxial accelerometer and treadmill speed. ©2005 IEEE
Auditory modelling for speech processing in the perceptual domain
The human hearing system is the most robust speech processor despite noisy environments. This work presents a new computational model for our auditory system by exploring the psychoacoustical masking properties. The model is then applied to speech coding in the perceptual domain. The coding algorithm is capable of producing high quality coded speech and audio, which account for temporal as well as spectral details. The proposed filterbank is also applied to speech denoising in the perceptual domain. The enhanced speech is of good perceptual quality
Recommended from our members
Multimodal Affect Models: An Investigation of Relative Salience of Audio and Visual Cues for Emotion Prediction
People perceive emotions via multiple cues, predominantly speech and visual cues, and a number of emotion recognition systems utilize both audio and visual cues. Moreover, the perception of static aspects of emotion (speaker's arousal level is high/low) and the dynamic aspects of emotion (speaker is becoming more aroused) might be perceived via different expressive cues and these two aspects are integrated to provide a unified sense of emotion state. However, existing multimodal systems only focus on single aspect of emotion perception and the contributions of different modalities toward modeling static and dynamic emotion aspects are not well explored. In this paper, we investigate the relative salience of audio and video modalities to emotion state prediction and emotion change prediction using a Multimodal Markovian affect model. Experiments conducted in the RECOLA database showed that audio modality is better at modeling the emotion state of arousal and video for emotion state of valence, whereas audio shows superior advantages over video in modeling emotion changes for both arousal and valence.</jats:p
FORWARD MASKING THRESHOLD ESTIMATION USING NEURAL NETWORKS AND ITS APPLICATION TO PARALLEL SPEECH ENHANCEMENT
Forward masking models have been used successfully in speech enhancement and audio coding. Presently, forward masking thresholds are estimated using simplified masking models which have been used for audio coding and speech enhancement applications. In this paper, an accurate approximation of forward masking threshold estimation using neural networks is proposed. A performance comparison to the other existing masking models in speech enhancement application is presented. Objective measures using PESQ demonstrates that our proposed forward masking model, provides significant improvements (5-15 %) over four existing models, when tested with speech signals corrupted by various noises at very low signal to noise ratios. Moreover, a parallel implementation of the speech enhancement algorithm was developed using Matlab parallel computing toolbox
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016
The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4 continents. The joint submission and several of its 32 sub-systems were among top-performing systems. A lot of efforts have been devoted to two major challenges, namely, unlabeled training data and dataset shift from Switchboard-Mixer to the new Call My Net dataset. This paper summarizes the lessons learned, presents our shared view from the sixteen research groups on recent advances, major paradigm shift, and common tool chain used in speaker recognition as we have witnessed in SRE'16. More importantly, we look into the intriguing question of fusing a large ensemble of sub-systems and the potential benefit of large-scale collaboration.Peer reviewe
2014 OptoElectronics and Communication Conference, OECC 2014 and Australian Conference on Optical Fibre Technology, ACOFT 2014
Research on singlemode polymer fiber Bragg gratings and their applications has been considerably progressed in the recent years and in this paper we report the recent research developments on polymer FBG sensor applications. © 2014 Engineers Australia