4,748 research outputs found
Minimising latency of pitch detection algorithms for live vocals on low-cost hardware
A pitch estimation device was proposed for live vocals to output appropriate pitch data through the musical instrument digital interface (MIDI). The intention was to ideally achieve unnoticeable latency while maintaining estimation accuracy. The projected target platform was low-cost, standalone hardware based around a microcontroller such as the Microchip PIC series. This study investigated, optimised and compared the performance of suitable algorithms for this application.
Performance was determined by two key factors: accuracy and latency. Many papers have been published over the past six decades assessing and comparing the accuracy of pitch detection algorithms on various signals, including vocals. However, very little information is available concerning the latency of pitch detection algorithms and methods with which this can be minimised. Real-time audio introduces a further latency challenge that is sparsely studied, minimising the length of sampled audio required by the algorithms in order to reduce overall total latency.
Thorough testing was undertaken in order to determine the best-performing algorithm and optimal parameter combination. Software modifications were implemented to facilitate accurate, repeatable, automated testing in order to build a comprehensive set of results encompassing a wide range of test conditions.
The results revealed that the infinite-peak-clipping autocorrelation function (IACF) performed better than the other autocorrelation functions tested and also identified ideal parameter values or value ranges to provide the optimal latency/accuracy balance.
Although the results were encouraging, testing highlighted some fundamental issues with vocal pitch detection. Potential solutions are proposed for further development
Audio Inpainting
(c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211
Introducing SPAIN (SParse Audio INpainter)
A novel sparsity-based algorithm for audio inpainting is proposed. It is an
adaptation of the SPADE algorithm by Kiti\'c et al., originally developed for
audio declipping, to the task of audio inpainting. The new SPAIN (SParse Audio
INpainter) comes in synthesis and analysis variants. Experiments show that both
A-SPAIN and S-SPAIN outperform other sparsity-based inpainting algorithms.
Moreover, A-SPAIN performs on a par with the state-of-the-art method based on
linear prediction in terms of the SNR, and, for larger gaps, SPAIN is even
slightly better in terms of the PEMO-Q psychoacoustic criterion
An evaluation of intrusive instrumental intelligibility metrics
Instrumental intelligibility metrics are commonly used as an alternative to
listening tests. This paper evaluates 12 monaural intrusive intelligibility
metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and
. In addition, this paper investigates the ability of
intelligibility metrics to generalize to new types of distortions and analyzes
why the top performing metrics have high performance. The intelligibility data
were obtained from 11 listening tests described in the literature. The stimuli
included Dutch, Danish, and English speech that was distorted by additive
noise, reverberation, competing talkers, pre-processing enhancement, and
post-processing enhancement. SIIB and HASPI had the highest performance
achieving a correlation with listening test scores on average of
and , respectively. The high performance of SIIB may, in part, be
the result of SIIBs developers having access to all the intelligibility data
considered in the evaluation. The results show that intelligibility metrics
tend to perform poorly on data sets that were not used during their
development. By modifying the original implementations of SIIB and STOI, the
advantage of reducing statistical dependencies between input features is
demonstrated. Additionally, the paper presents a new version of SIIB called
, which has similar performance to SIIB and HASPI,
but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 201
Sound and noise
Sound and noise problems in space environment and human tolerance criteria at varying frequencies and intensitie
Categorisation of distortion profiles in relation to audio quality
Since digital audio is encoded as discrete samples of the audio waveform, much can be said about a recording by the statistical properties of these samples. In this paper, a dataset of CD audio samples is analysed; the probability mass function of each audio clip informs a feature set which describes attributes of the musical recording related to loudness, dynamics and distortion. This allows musical recordings to be classified according to their “distortion character”, a concept which describes the nature of amplitude distortion in mastered audio. A subjective test was designed in which such recordings were rated according to the perception of their audio quality. It is shown that participants can discern between three different distortion characters; ratings of audio quality were significantly different (F(1; 2) = 5:72; p < 0:001; eta^2 = 0:008) as were the words used to describe the attributes on which quality was assessed (�Chi^2(8; N = 547) = 33:28; p < 0:001).This expands upon previous work showing links between the effects of dynamic range compression and audio quality in musical
recordings, by highlighting perceptual differences
- …