5 research outputs found

    Segmentación de audio broadcast

    Full text link
    La segmentación de audio tiene un indudable interés de investigación, ya que es un paso esencial en el pre-procesado de multitud de aplicaciones de procesado de audio debido a que proporciona una notable mejoría de rendimiento en muchos ámbitos de las tecnologías del habla. De ahí el interés de este TFG. El desarrollo del mismo comenzará con un estudio del estado del arte en segmentación de audio, y posteriormente se analizará el sistema que se ha diseñado e implementado para este proyecto. En concreto, detallaremos un sencillo e intuitivo algoritmo útil para detectar señal de voz de una forma muy precisa en señales de audio broadcast. La base de dicho algoritmo es que si comparamos los espectrogramas de señales de voz, música y ruido, podemos observar que las señales de voz suelen mostrar patrones relativos a la presencia de varios armónicos, que son influenciados por la forma del tracto vocal y que en señales de música y ruido no aparecen. De esta forma, proponemos capturar esas trayectorias de armónicos que en contraste con las notas musicales, varían de frecuencia, y así detectar en qué zonas de la señal hay presencia de voz. Posteriormente utilizaremos otras características de la señal de voz para mejorar este algoritmo inicial, como por ejemplo la información de la frecuencia fundamental (pitch) para introducir nuevos datos que nos permitan mejorar la detección de voz. Finalmente, los resultados ofrecidos por este algoritmo se evaluarán mostrando la tasa de acierto y de error tras la aplicación del sistema diseñado sobre la base de datos creada por el Área de Tratamiento de Voz y Señales (ATVS). Esta base de datos contiene 20 horas audio etiquetado de 4 programas de radio reales que contienen música, publicidad, tertulias… y cuya implementación también forma parte de este proyecto.Audio segmentation is undoubtedly an interest of investigation, as it is an essential step in the preprocessing of many audio processing applications, due to its contribution of a significant effectiveness improvement in many areas of technology speech. Therefore the interest of this TFG. Its development will begin with a study of the state of the art audio segmentation. Later on, the system designed and implemented for this project will be analysed. In detail, we will present a simple and intuitive spectral feature for detecting the presence of spoken speech into audio broadcast signal. The basis of the mentioned algorithm is that if we compare the spectrograms of speech signals, music and noise, can we observe that speech signals usually display patterns relating to the presence of several harmonics, and that in music and noise signals do not appear. In this way, we propose to capture sustained harmonics´ trajectories which –in contrast to the partials of a note played on a musical instrument– vary in frequency, and so detect in which areas of the signal, the voice is present. After that, we will use other features of the speech signal to improve the initial algorithm. For example, the information of the fundamental frequency (pitch) to enter new data which allow us to improve the speech detection. Finally, the results offered by this algorithm will be evaluated showing the success and error rates, after having applied the designed system on database created by Area de Tratamiento de Voz y Señales (ATVS). This database contains 20 labelled audio hours of 4 programs which contain music advertising, social gatherings… and whose implementation is also part of this project

    Malay articulation system for early screening diagnostic using hidden markov model and genetic algorithm

    Get PDF
    Speech recognition is an important technology and can be used as a great aid for individuals with sight or hearing disabilities today. There are extensive research interest and development in this area for over the past decades. However, the prospect in Malaysia regarding the usage and exposure is still immature even though there is demand from the medical and healthcare sector. The aim of this research is to assess the quality and the impact of using computerized method for early screening of speech articulation disorder among Malaysian such as the omission, substitution, addition and distortion in their speech. In this study, the statistical probabilistic approach using Hidden Markov Model (HMM) has been adopted with newly designed Malay corpus for articulation disorder case following the SAMPA and IPA guidelines. Improvement is made at the front-end processing for feature vector selection by applying the silence region calibration algorithm for start and end point detection. The classifier had also been modified significantly by incorporating Viterbi search with Genetic Algorithm (GA) to obtain high accuracy in recognition result and for lexical unit classification. The results were evaluated by following National Institute of Standards and Technology (NIST) benchmarking. Based on the test, it shows that the recognition accuracy has been improved by 30% to 40% using Genetic Algorithm technique compared with conventional technique. A new corpus had been built with verification and justification from the medical expert in this study. In conclusion, computerized method for early screening can ease human effort in tackling speech disorders and the proposed Genetic Algorithm technique has been proven to improve the recognition performance in terms of search and classification task

    Segmentation and Classification of Broadcast News Audio

    No full text