3,428 research outputs found
Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Without the need for a clean reference, non-intrusive speech assessment
methods have caught great attention for objective evaluations. While deep
learning models have been used to develop non-intrusive speech assessment
methods with promising results, there is limited research on hearing-impaired
subjects. This study proposes a multi-objective non-intrusive hearing-aid
speech assessment model, called HASA-Net Large, which predicts speech quality
and intelligibility scores based on input speech signals and specified
hearing-loss patterns. Our experiments showed the utilization of pre-trained
SSL models leads to a significant boost in speech quality and intelligibility
predictions compared to using spectrograms as input. Additionally, we examined
three distinct fine-tuning approaches that resulted in further performance
improvements. Furthermore, we demonstrated that incorporating SSL models
resulted in greater transferability to OOD dataset. Finally, this study
introduces HASA-Net Large, which is a non-invasive approach for evaluating
speech quality and intelligibility. HASA-Net Large utilizes raw waveforms and
hearing-loss patterns to accurately predict speech quality and intelligibility
levels for individuals with normal and impaired hearing and demonstrates
superior prediction performance and transferability
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network
Speech intelligibility and quality assessment models are essential tools for
researchers to evaluate and improve speech processing models. However, only a
few studies have investigated multi-task models for intelligibility and quality
assessment due to the limitations of available data. In this study, we released
TMHINT-QI, the first Chinese speech dataset that records the quality and
intelligibility scores of clean, noisy, and enhanced utterances. Then, we
propose InQSS, a non-intrusive multi-task learning framework for
intelligibility and quality assessment. We evaluated the InQSS on both the
training-from-scratch and the pretrained models. The experimental results
confirm the effectiveness of the InQSS framework. In addition, the resulting
model can predict not only the intelligibility scores but also the quality
scores of a speech signal.Comment: accepted by Insterspeech 202
An evaluation of intrusive instrumental intelligibility metrics
Instrumental intelligibility metrics are commonly used as an alternative to
listening tests. This paper evaluates 12 monaural intrusive intelligibility
metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and
. In addition, this paper investigates the ability of
intelligibility metrics to generalize to new types of distortions and analyzes
why the top performing metrics have high performance. The intelligibility data
were obtained from 11 listening tests described in the literature. The stimuli
included Dutch, Danish, and English speech that was distorted by additive
noise, reverberation, competing talkers, pre-processing enhancement, and
post-processing enhancement. SIIB and HASPI had the highest performance
achieving a correlation with listening test scores on average of
and , respectively. The high performance of SIIB may, in part, be
the result of SIIBs developers having access to all the intelligibility data
considered in the evaluation. The results show that intelligibility metrics
tend to perform poorly on data sets that were not used during their
development. By modifying the original implementations of SIIB and STOI, the
advantage of reducing statistical dependencies between input features is
demonstrated. Additionally, the paper presents a new version of SIIB called
, which has similar performance to SIIB and HASPI,
but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 201
Speech assessment and characterization for law enforcement applications
Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by
one or more effects including, for example, additive noise. These degradations alter the signal
properties in a manner that deteriorates the intelligibility or quality of the speech signal. In
the law enforcement context such degradations are commonplace due to the limitations in
the audio collection methodology, which is often required to be covert. In severe degradation
conditions, the acquired signal may become unintelligible, losing its value in an investigation
and in less severe conditions, a loss in signal quality may be encountered, which can lead to
higher transcription time and cost.
This thesis proposes a non-intrusive speech assessment framework from which algorithms for
speech quality and intelligibility assessment are derived, to guide the collection and transcription
of law enforcement audio. These methods are trained on a large database labelled using
intrusive techniques (whose performance is verified with subjective scores) and shown to perform
favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive
CODEC identification and verification algorithm is developed which can identify a CODEC with
an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 %
in the presence of additive noise.
Finally, the speech description taxonomy framework is developed, with the aim of characterizing
various aspects of a degraded speech signal, including the mechanism that results in a signal
with particular characteristics, the vocabulary that can be used to describe those degradations
and the measurable signal properties that can characterize the degradations. The taxonomy is
implemented as a relational database that facilitates the modeling of the relationships between
various attributes of a signal and promises to be a useful tool for training and guiding audio
analysts
Improved status following behavioural intervention in a case of severe dysarthria with stroke aetiology
There is little published intervention outcome literature concerning dysarthria acquired from stroke. Single case studies have the potential to provide more detailed specification and interpretation than is generally possible with larger participant numbers and are thus informative for clinicians who may deal with similar cases. Such research also contributes to the future planning of larger scale investigations. Behavioural intervention is described which was carried out with a man with severe dysarthria following stroke, beginning at seven and ending at nine months after stroke. Pre-intervention stability between five and seven months contrasted with significant improvements post-intervention on listener-rated measures of word and reading intelligibility and communication effectiveness in conversation. A range of speech analyses were undertaken (comprising of rate, pause and intonation characteristics in connected speech and phonetic transcription of single word production), with the aim of identifying components of speech which might explain the listeners’ perceptions of improvement. Pre- and post intervention changes could be detected mainly in parameters related to utterance segmentation and intonation. The basis of improvement in dysarthria following intervention is complex, both in terms of the active therapeutic dimensions and also the specific speech alterations which account for changes to intelligibility and effectiveness. Single case results are not necessarily generalisable to other cases and outcomes may be affected by participant factors and therapeutic variables, which are not readily controllable
E-model modification for case of cascade codecs arrangement
Speech quality assessment is one of the key matters of
voice services and every provider should ensure adequate connection
quality to end users. Speech quality has to be measured by a trusted
method and results have to correlate with intelligibility and clarity of
the speech, as perceived by the listener. It can be achieved by
subjective methods but in real life we must rely on objective
measurements based on reliable models. One of them is E-model that
we can consider as mainly adopted method in IP telephony. This
method is based on evaluation of transmission path impairments
influencing speech signal, especially delays and packet losses. These
parameters which are common in IP network can affect dramatically
speech quality. In this article, a new modification of E-model, that
takes into consideration the cascade codecs arrangement, is
presented. The proposed a correction function improves the current
computational non-intrusive approach that is described in
recommendation ITU-T G.107, so-called E-model.Scopus551447143
- …