4,748 research outputs found

    Minimising latency of pitch detection algorithms for live vocals on low-cost hardware

    Get PDF
    A pitch estimation device was proposed for live vocals to output appropriate pitch data through the musical instrument digital interface (MIDI). The intention was to ideally achieve unnoticeable latency while maintaining estimation accuracy. The projected target platform was low-cost, standalone hardware based around a microcontroller such as the Microchip PIC series. This study investigated, optimised and compared the performance of suitable algorithms for this application. Performance was determined by two key factors: accuracy and latency. Many papers have been published over the past six decades assessing and comparing the accuracy of pitch detection algorithms on various signals, including vocals. However, very little information is available concerning the latency of pitch detection algorithms and methods with which this can be minimised. Real-time audio introduces a further latency challenge that is sparsely studied, minimising the length of sampled audio required by the algorithms in order to reduce overall total latency. Thorough testing was undertaken in order to determine the best-performing algorithm and optimal parameter combination. Software modifications were implemented to facilitate accurate, repeatable, automated testing in order to build a comprehensive set of results encompassing a wide range of test conditions. The results revealed that the infinite-peak-clipping autocorrelation function (IACF) performed better than the other autocorrelation functions tested and also identified ideal parameter values or value ranges to provide the optimal latency/accuracy balance. Although the results were encouraging, testing highlighted some fundamental issues with vocal pitch detection. Potential solutions are proposed for further development

    Audio Inpainting

    Get PDF
    (c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211

    Introducing SPAIN (SParse Audio INpainter)

    Full text link
    A novel sparsity-based algorithm for audio inpainting is proposed. It is an adaptation of the SPADE algorithm by Kiti\'c et al., originally developed for audio declipping, to the task of audio inpainting. The new SPAIN (SParse Audio INpainter) comes in synthesis and analysis variants. Experiments show that both A-SPAIN and S-SPAIN outperform other sparsity-based inpainting algorithms. Moreover, A-SPAIN performs on a par with the state-of-the-art method based on linear prediction in terms of the SNR, and, for larger gaps, SPAIN is even slightly better in terms of the PEMO-Q psychoacoustic criterion

    An evaluation of intrusive instrumental intelligibility metrics

    Full text link
    Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSMcorr\text{sEPSM}^\text{corr}. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of ρ=0.92\rho=0.92 and ρ=0.89\rho=0.89, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called SIIBGauss\text{SIIB}^\text{Gauss}, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

    Sound and noise

    Get PDF
    Sound and noise problems in space environment and human tolerance criteria at varying frequencies and intensitie

    Categorisation of distortion profiles in relation to audio quality

    Get PDF
    Since digital audio is encoded as discrete samples of the audio waveform, much can be said about a recording by the statistical properties of these samples. In this paper, a dataset of CD audio samples is analysed; the probability mass function of each audio clip informs a feature set which describes attributes of the musical recording related to loudness, dynamics and distortion. This allows musical recordings to be classified according to their “distortion character”, a concept which describes the nature of amplitude distortion in mastered audio. A subjective test was designed in which such recordings were rated according to the perception of their audio quality. It is shown that participants can discern between three different distortion characters; ratings of audio quality were significantly different (F(1; 2) = 5:72; p < 0:001; eta^2 = 0:008) as were the words used to describe the attributes on which quality was assessed (�Chi^2(8; N = 547) = 33:28; p < 0:001).This expands upon previous work showing links between the effects of dynamic range compression and audio quality in musical recordings, by highlighting perceptual differences
    corecore