Search CORE

29 research outputs found

Corpus Based Reconstruction of Speech Degraded by Wind Noise

Author: Naylor P.A
Nelke Christoph
Vary P.
Publication venue
Publication date
Field of study

Publication in the conference proceedings of EUSIPCO, Nice, France, 201

ZENODO

Quality analysis for coded images with loss

Author: De Giusti Armando Eduardo
Ramón Hugo Dionisio
Russo Claudia Cecilia
Publication venue
Publication date: 05/11/2012
Field of study

A considerable compression rate can only be achieved by means of algorithms with loss, which means that it is not possible to recover the exact original image. This loss of information may have a direct relation to the loss of quality, as well as cause problems related to reliability depending on the area of the application. The main question to answer is: How do we decide that an image compressed with loss is suitable to be used for a given application? This question is answered by defining a subjective quality assessment and then relating it to objective values in order to find - by means of a statistic analysis of the data - a relationship between quality and compression ratio. In particular, the incidence of the type of histogram and its relation to subjective loss and compression ratio are studied. This paper is related to the Magister Thesis “Análisis del Error en Algoritmos de Transmisión de Imágenes Comprimidas con Pérdida” ("Error Analysis in Transmission Algorithms of Images Compressed with Loss") by Lic. RamónÁrea: Procesamiento de Imágenes - Tratamiento de Señales - Computación Gráfica - VisualizaciónRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Microphone array power ratio for quality assessment of reverberated speech

Author
Publication venue: Springer
Publication date: 18/06/2015
Field of study

Springer - Publisher Connector

Assessment of objective quality measures for speech intelligibility estimation

Author: John S D Mason
Keith A Jellyman
Nicholas W D Evans
Wei M Liu
Publication venue
Publication date: 01/01/2006
Field of study

ABSTRACT This paper investigates the accuracy of automatic speech recognition (ASR) and 6 other well-reported objective quality measures for the task of estimating speech intelligibility. It is believed to be the first assessment of such a range of measures side-by-side and in the context of intelligibility. A total of 39 degradation conditions including those from a newly proposed low bit rate (0.3 to 1.5kbps) codec and a noise suppression system are considered. They provide real and varied scenarios to assess the measures. The objective scores are compared to subjective listening scores, and their correlation used to assess the approach. All tests are conducted on the European standard Aurora 2 corpus. Experiments show that ASR and perceptual estimation of speech quality (PESQ) are potentially reliable estimators of intelligibility with subjective correlation as high as 0.99 and 0.96 respectively. Furthermore, ASR gives a trend corresponding to that of subjective intelligibility assessment for the different configurations of the new codec, while most others fail

CiteSeerX

Percepcijska utemeljenost kepstranih mjera udaljenosti za primjene u obradi govora

Author: Antonio Vasilijević
Davor Petrinović
Publication venue: KoREMA - Croatian Society for Communications, Computing, Electronics, Measurement and Control
Publication date: 01/01/2011
Field of study

Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefﬁcients (MFCC). MFCCs are based on ﬁlter bank algorithm whose ﬁlters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel ﬁlter bank parameters it is found that ﬁlter bank with 24 bands, 220 mels bandwidth and band overlap coefﬁcient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel ﬁlter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefﬁcients) is justiﬁed for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.Jedna od danas najčešće korištenih mjera u automatskom prepoznavanju govora i govornika je mjera euklidske udaljenosti MFCC vektora. Algoritam za izračunavanje mel frekvencijskih kepstralnih koeﬁcijenata zasniva se na ﬁltarskom slogu kod kojeg su pojasi ekvidistantno raspoređeni na percepcijski motiviranoj mel skali. Na vrijednost mel kepstralnog vektora, a samim time i na svojstva kepstralne mjere udaljenosti glasova, utječe veći broj parametara sustava za kepstralnu analizu. Tema ovog rada je ispitati usklađenost MFCC mjere sa stvarnim percepcijskim razlikama za različite vrijednosti parametara analize. Analizom parametara mel ﬁltarskog sloga utvrdili smo da ﬁltar sa 24 pojasa, širine 220 mel-a i faktorom preklapanja ﬁltra većim ili jednakim jedan, daje optimalne SD mjere koje se najbolje slažu s percepcijom. Za takav mel ﬁltarski slog granica čujnosti razlike između glasova je 0.4-0.5 dB, mjereno SD RMS razlikom potpunih mel kepstralnih vektora. Također, pokazat ćemo da je korištenje mel kepstralnog vektora odrezanog na konačnu dužinu (12 koeﬁcijenata) opravdano za prepoznavanje govora, ali da bi moglo biti upitno u primjenama prepoznavanja govornika. Analizirali smo i utjecaj preklapanja spektara u kepstralnoj domeni na mjere udaljenosti glasova. Utvrđena je izrazita koreliranost SD razlika izračunatih iz aperiodskog i periodičkog mel kepstra iz čega zaključujemo da je utjecaj preklapanja spektara generalno zanemariv. Postoje rijetke iznimke kod kojih je utjecaj preklapanja spektara prisutan, te su one posebno analizirane

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

New single-ended objective measure for non-intrusive speech quality evaluation

Author: A.W. Rix
Abdulhussain E. Mahdi
Dorel Picovici
H. Hermansky
J. Vesanto
J.G. Beerends
J.G. Beerends
J.L. Hall
K. Gopalan
L. Malfait
M.R. Schroeder
P. Gray
S. Voran
S. Wang
T.E. Quatieri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2014
Field of study

peer-reviewedThis article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.acceptedpeer-reviewe

University of Limerick Institutional Repository

Crossref