    Corpus Based Reconstruction of Speech Degraded by Wind Noise

    Publication in the conference proceedings of EUSIPCO, Nice, France, 2015

    Quality analysis for coded images with loss

    A considerable compression rate can only be achieved by means of algorithms with loss, which means that it is not possible to recover the exact original image. This loss of information may have a direct relation to the loss of quality, as well as cause problems related to reliability depending on the area of the application. The main question to answer is: How do we decide that an image compressed with loss is suitable to be used for a given application? This question is answered by defining a subjective quality assessment and then relating it to objective values in order to find - by means of a statistic analysis of the data - a relationship between quality and compression ratio. In particular, the incidence of the type of histogram and its relation to subjective loss and compression ratio are studied. This paper is related to the Magister Thesis “Análisis del Error en Algoritmos de Transmisión de Imágenes Comprimidas con Pérdida” ("Error Analysis in Transmission Algorithms of Images Compressed with Loss") by Lic. RamónÁrea: Procesamiento de Imágenes - Tratamiento de Señales - Computación Gráfica - VisualizaciónRed de Universidades con Carreras en Informática (RedUNCI

    Assessment of objective quality measures for speech intelligibility estimation

    ABSTRACT This paper investigates the accuracy of automatic speech recognition (ASR) and 6 other well-reported objective quality measures for the task of estimating speech intelligibility. It is believed to be the first assessment of such a range of measures side-by-side and in the context of intelligibility. A total of 39 degradation conditions including those from a newly proposed low bit rate (0.3 to 1.5kbps) codec and a noise suppression system are considered. They provide real and varied scenarios to assess the measures. The objective scores are compared to subjective listening scores, and their correlation used to assess the approach. All tests are conducted on the European standard Aurora 2 corpus. Experiments show that ASR and perceptual estimation of speech quality (PESQ) are potentially reliable estimators of intelligibility with subjective correlation as high as 0.99 and 0.96 respectively. Furthermore, ASR gives a trend corresponding to that of subjective intelligibility assessment for the different configurations of the new codec, while most others fail

    Percepcijska utemeljenost kepstranih mjera udaljenosti za primjene u obradi govora

    Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.Jedna od danas najčešće korištenih mjera u automatskom prepoznavanju govora i govornika je mjera euklidske udaljenosti MFCC vektora. Algoritam za izračunavanje mel frekvencijskih kepstralnih koeficijenata zasniva se na filtarskom slogu kod kojeg su pojasi ekvidistantno raspoređeni na percepcijski motiviranoj mel skali. Na vrijednost mel kepstralnog vektora, a samim time i na svojstva kepstralne mjere udaljenosti glasova, utječe veći broj parametara sustava za kepstralnu analizu. Tema ovog rada je ispitati usklađenost MFCC mjere sa stvarnim percepcijskim razlikama za različite vrijednosti parametara analize. Analizom parametara mel filtarskog sloga utvrdili smo da filtar sa 24 pojasa, širine 220 mel-a i faktorom preklapanja filtra većim ili jednakim jedan, daje optimalne SD mjere koje se najbolje slažu s percepcijom. Za takav mel filtarski slog granica čujnosti razlike između glasova je 0.4-0.5 dB, mjereno SD RMS razlikom potpunih mel kepstralnih vektora. Također, pokazat ćemo da je korištenje mel kepstralnog vektora odrezanog na konačnu dužinu (12 koeficijenata) opravdano za prepoznavanje govora, ali da bi moglo biti upitno u primjenama prepoznavanja govornika. Analizirali smo i utjecaj preklapanja spektara u kepstralnoj domeni na mjere udaljenosti glasova. Utvrđena je izrazita koreliranost SD razlika izračunatih iz aperiodskog i periodičkog mel kepstra iz čega zaključujemo da je utjecaj preklapanja spektara generalno zanemariv. Postoje rijetke iznimke kod kojih je utjecaj preklapanja spektara prisutan, te su one posebno analizirane

    New single-ended objective measure for non-intrusive speech quality evaluation

    peer-reviewedThis article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.acceptedpeer-reviewe