20 research outputs found
Semantic Processing of Out-Of-Vocabulary Words in a Spoken Dialogue System
One of the most important causes of failure in spoken dialogue systems is
usually neglected: the problem of words that are not covered by the system's
vocabulary (out-of-vocabulary or OOV words). In this paper a methodology is
described for the detection, classification and processing of OOV words in an
automatic train timetable information system. The various extensions that had
to be effected on the different modules of the system are reported, resulting
in the design of appropriate dialogue strategies, as are encouraging evaluation
results on the new versions of the word recogniser and the linguistic
processor.Comment: 4 pages, 2 eps figures, requires LaTeX2e, uses eurospeech.sty and
epsfi
Integrating Syntactic and Prosodic Information for the Efficient Detection of Empty Categories
We describe a number of experiments that demonstrate the usefulness of
prosodic information for a processing module which parses spoken utterances
with a feature-based grammar employing empty categories. We show that by
requiring certain prosodic properties from those positions in the input where
the presence of an empty category has to be hypothesized, a derivation can be
accomplished more efficiently. The approach has been implemented in the machine
translation project VERBMOBIL and results in a significant reduction of the
work-load for the parser.Comment: To appear in the Proceedings of Coling 1996, Copenhagen. 6 page
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Speech intelligibility assessment plays an important role in the therapy of
patients suffering from pathological speech disorders. Automatic and objective
measures are desirable to assist therapists in their traditionally subjective
and labor-intensive assessments. In this work, we investigate a novel approach
for obtaining such a measure using the divergence in disentangled latent speech
representations of a parallel utterance pair, obtained from a healthy reference
and a pathological speaker. Experiments on an English database of Cerebral
Palsy patients, using all available utterances per speaker, show high and
significant correlation values (R = -0.9) with subjective intelligibility
measures, while having only minimal deviation (+-0.01) across four different
reference speaker pairs. We also demonstrate the robustness of the proposed
method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a
significantly smaller amount of utterances per speaker. Our results are among
the first to show that disentangled speech representations can be used for
automatic pathological speech intelligibility assessment, resulting in a
reference speaker pair invariant method, applicable in scenarios with only few
utterances available.Comment: Submitted to INTERSPEECH202
Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages
Parkinson's disease (PD) is a neurological disorder impacting a person's
speech. Among automatic PD assessment methods, deep learning models have gained
particular interest. Recently, the community has explored cross-pathology and
cross-language models which can improve diagnostic accuracy even further.
However, strict patient data privacy regulations largely prevent institutions
from sharing patient speech data with each other. In this paper, we employ
federated learning (FL) for PD detection using speech signals from 3 real-world
language corpora of German, Spanish, and Czech, each from a separate
institution. Our results indicate that the FL model outperforms all the local
models in terms of diagnostic accuracy, while not performing very differently
from the model based on centrally combined training sets, with the advantage of
not requiring any data sharing among collaborators. This will simplify
inter-institutional collaborations, resulting in enhancement of patient
outcomes.Comment: Accepted for INTERSPEECH 202
A survey on perceived speaker traits: personality, likability, pathology, and the first challenge
The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks