3,873 research outputs found

    Minimising latency of pitch detection algorithms for live vocals on low-cost hardware

    Get PDF
    A pitch estimation device was proposed for live vocals to output appropriate pitch data through the musical instrument digital interface (MIDI). The intention was to ideally achieve unnoticeable latency while maintaining estimation accuracy. The projected target platform was low-cost, standalone hardware based around a microcontroller such as the Microchip PIC series. This study investigated, optimised and compared the performance of suitable algorithms for this application. Performance was determined by two key factors: accuracy and latency. Many papers have been published over the past six decades assessing and comparing the accuracy of pitch detection algorithms on various signals, including vocals. However, very little information is available concerning the latency of pitch detection algorithms and methods with which this can be minimised. Real-time audio introduces a further latency challenge that is sparsely studied, minimising the length of sampled audio required by the algorithms in order to reduce overall total latency. Thorough testing was undertaken in order to determine the best-performing algorithm and optimal parameter combination. Software modifications were implemented to facilitate accurate, repeatable, automated testing in order to build a comprehensive set of results encompassing a wide range of test conditions. The results revealed that the infinite-peak-clipping autocorrelation function (IACF) performed better than the other autocorrelation functions tested and also identified ideal parameter values or value ranges to provide the optimal latency/accuracy balance. Although the results were encouraging, testing highlighted some fundamental issues with vocal pitch detection. Potential solutions are proposed for further development

    A robust speech enhancement method in noisy environments

    Get PDF
    Speech enhancement aims to eliminate or reduce undesirable noises and distortions, this processing should keep features of the speech to enhance the quality and intelligibility of degraded speech signals. In this study, we investigated a combined approach using single-frequency filtering (SFF) and a modified spectral subtraction method to enhance single-channel speech. The SFF method involves dividing the speech signal into uniform subband envelopes, and then performing spectral over-subtraction on each envelope. A smoothing parameter, determined by the a-posteriori signal-to-noise ratio (SNR), is used to estimate and update the noise without the need for explicitly detecting silence. To evaluate the performance of our algorithm, we employed objective measures such as segmental SNR (segSNR), extended short-term objective intelligibility (ESTOI), and perceptual evaluation of speech quality (PESQ). We tested our algorithm with various types of noise at different SNR levels and achieved results ranging from 4.24 to 15.41 for segSNR, 0.57 to 0.97 for ESTOI, and 2.18 to 4.45 for PESQ. Compared to other standard and existing speech enhancement methods, our algorithm produces better results and performs well in reducing undesirable noises

    Contributions to automatic multiple F0 detection in polyphonic music signals

    Get PDF
    Multiple fundamental frequency estimation, or multi-pitch estimation (MPE), is a key problem in automatic music transcription (AMT) and many other related audio processing tasks. Applications of AMT are numerous, ranging from musical genre classification to automatic piano tutoring, and these form a significant part of musical information retrieval tasks. Current AMT systems still perform considerably below human experts, and there is a consensus that the development of an automated system for full transcription of polyphonic music regardless of its complexity is still an open problem. The goal of this work is to propose contributions for the automatic detection of multiple fundamental frequencies in polyphonic music signals. A reference MPE method is chosen to be studied and implemented, and a modification is proposed to improve the performance of the system. Lastly, three refinement strategies are proposed to be incorporated into the modified method, in order to increase the quality of the results. Experimental tests reveal that such refinements improve the overall performance of the system, even if each one performs differently according to signal characteristics.Estimação de múltiplas frequências fundamentais (MPE, do inglês multipitch estimation) é um problema importante na área de transcrição musical automática (TMA) e em muitas outras tarefas relacionadas a processamento de áudio. Aplicações de TMA são diversas, desde classificação de gêneros musicais ao aprendizado automático de piano, as quais consistem em uma parcela significativa de tarefas de extração de informação musical. Métodos atuais de TMA ainda possuem um desempenho consideravelmente ruim quando comparados aos de profissionais da área, e há um consenso que o desenvolvimento de um sistema automatizado para a transcrição completa de música polifônica independentemente de sua complexidade ainda é um problema em aberto. O objetivo deste trabalho é propor contribuições para a detecção automática de múltiplas frequências fundamentais em sinais de música polifônica. Um método de referência para MPEé primeiramente escolhido para ser estudado e implementado, e uma modificação é proposta para melhorar o desempenho do sistema. Por fim, três estratégias de refinamento são propostas para serem incorporadas ao método modificado, com o objetivo de aumentar a qualidade dos resultados. Testes experimentais mostram que tais refinamentos melhoram em média o desempenho do sistema, embora cada um atue de uma maneira diferente de acordo com a natureza dos sinais

    A spectral estimator of vocal jitter

    Get PDF
    Projecte final de carrera fet en col.laboració amb l'Université libre de Bruxelles. Faculté des Sciences AppliquéesEnglish: The purpose of this thesis is to study and implement a spectral method for short-time jitter estimation. Jitter consists in rapid perturbations of the vocal cycle lengths, which can be observed from one cycle to the next when they are sampled, at least, at the rate of the fundamental frequency. Jitter is analyzed for voice quality assessment given that it provides a high correlation with voice disorders. The method is based on a mathematical model that describes the association of two periodical spike trains. Jitter is modeled as the perturbation of one of those impulse trains with respect to the other. The proposed method computes this perturbation, indirectly, by taking into account spectral properties. By counting the number of crossings between the harmonic and the inter-harmonic contours the perturbation in samples can be obtained. A Matlab application is implemented to ascertain the validity and reliability. The Praat software is used as a reference for the assessment of the jitter values. Given the references provided by Praat, comparison is made with spectral jitter measurements in different situations. Experiments with ideally perturbed spike trains show that the suggested method produces accurate local estimations of jitter. Additional evaluation relies on testing and analyzing synthetic phonation and connected speech. A performance appraisal allows us to enhance the method, i.e., to try for a better implementation in order to have more accurate estimates. The results are presented the reliability of the spectral jitter estimator is analyzed.Castellano: El objetivo de este trabajo es estudiar e implementar un método para la estimación espectral del jitter. Jitter consiste en pequeñas y rápida perturbaciones en la duración de los ciclos vocales, las cuales pueden ser observadas de ciclo a ciclo. Jitter es analizado habitualmente para la evaluación de la calidad de voz dado que proporciona una alta correlación con trastornos en la voz. El método se basa en un modelo matemático que describe la asociación de dos trenes de pulsos periódicos. Jitter se modela como la perturbación de uno de los trenes de pulsos con respecto del otro. El método propuesto calcula esta perturbación, de manera indirecta, teniendo en cuenta dichas propiedades espectrales. Contando el número de cruces entre el contorno armónico y el inter-armónico se pueden obtener una estimación de la perturbación. Una aplicación en Matlab es llevada a cabo para comprobar la validez de dicho método. El software Praat se utiliza como referencia para la evaluación de las estimaciones del jitter. La comparación se realiza con estimaciones de jitter en diferentes situaciones respecto a los valores de referencia proporcionados por Praat. Experimentos con trenes de pulsos muestran que el método propuesto produce estimaciones precisas de la perturbación. Adicionalmente se realizarán pruebas con muestras de voces sintéticas y reales. A través del análisis de los resultados se mejora el método, es decir, se obtienen estimaciones más precisas. Dichos resultados se presentan finalmente para evaluar la fiabilidad y validez del estimador espectral del jitter que se ha desarrollado.Català: L'objectiu d'aquest treball és estudiar i implementar un mètode per a l'estimació espectral del jitter. Jitter consisteix en petites i ràpides pertorbacions en la durada dels cicles vocals, les quals poden ser observades d'un cicle al següent. Jitter és analitzat habitualment per a l'avaluació de la qualitat de veu ja que proporciona una alta correlació amb trastorns en la mateixa. El mètode es basa en un model matemàtic que descriu l'associació de dos trens de polsos periòdics. Jitter es modela com la pertorbació d'un dels trens de polsos respecte de l'altre. El mètode proposat calcula aquesta pertorbació, de manera indirecta, tenint en compte les propietats espectrals. Comptant el nombre de creuaments entre el contorn harmònic i l'inter-harmònic s?obté una estimació de la pertorbació. Una aplicació implementada en Matlab és realitzada per a comprovar la validesa d'aquest mètode. El programari Praat s'utilitza com a referència per l'avaluació de les estimacions del jitter. La comparació es realitza amb estimacions de jitter en diferents situacions respecte als valors de referència proporcionats per Praat. Experiments amb trens de polsos mostren que el mètode proposat produeix estimacions precises de la pertorbació. Addicionalment es realitzen proves amb mostres de veus sintètiques i reals. A través de l'anàlisi dels resultats es millora el mètode, és a dir, s'obtenen estimacions més precises del jitter. Aquests resultats es presenten finalment per avaluar la fiabilitat i validesa de l'estimador espectral de jitter que s'ha portat a terme
    corecore