4 research outputs found
How to Measure Speech Recognition Performance in the Air Traffic Control Domain? The Word Error Rate is only half of the truth
Applying Automatic Speech Recognition (ASR) in the domain
of analogue voice communication between air traffic
controllers (ATCo) and pilots has more end user requirements
than just transforming spoken words into text. It is useless,
when word recognition is perfect, as long as the semantic
interpretation is wrong. For an ATCo it is of no importance if
the words of greeting are correctly recognized. A wrong
recognition of a greeting should, however, not disturb the
correct recognition of e.g. a “descend” command. Recently, 14
European partners from Air Traffic Management (ATM)
domain have agreed on a common set of rules, i.e., an ontology
on how to annotate the speech utterance of an ATCo. This paper
first extends the ontology to pilot utterances and then compares
different ASR implementations on semantic level by
introducing command recognition, command recognition error,
and command rejection rates. The implementation used in this
paper achieves a command recognition rate better than 94% for
Prague Approach, even when WER is above 2.5
Semi-supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control
Automatic Speech Recognition (ASR) can introduce higher levels
of automation into Air Traffic Control (ATC), where spoken
language is still the predominant form of communication.
While ATC uses standard phraseology and a limited vocabulary,
we need to adapt the speech recognition systems to local
acoustic conditions and vocabularies at each airport to reach
optimal performance. Due to continuous operation of ATC systems,
a large and increasing amount of untranscribed speech
data is available, allowing for semi-supervised learning methods
to build and adapt ASR models. In this paper, we first identify
the challenges in building ASR systems for specific ATC
areas and propose to utilize out-of-domain data to build baseline
ASR models. Then we explore different methods of data
selection for adapting baseline models by exploiting the continuously
increasing untranscribed data. We develop a basic approach
capable of exploiting semantic representations of ATC
commands. We achieve relative improvement in both word error
rate (23.5%) and concept error rates (7%) when adapting
ASR models to different ATC conditions in a semi-supervised
manner
Adaptation of Assistant Based Speech Recognition to New Domains and its Acceptance by Air Traffic Controllers
In air traffic control rooms, paper flight strips are more and more replaced by digital solutions. The digital systems, however, increase the workload for air traffic controllers: For instance, each voice-command must be manually inserted into the system by the controller. Recently the AcListant® project has validated that Assistant Based Speech Recognition (ABSR) can replace the manual inputs by automatically recognized voice commands. Adaptation of
ABSR to different environments, however, has shown to be expensive. The Horizon 2020 funded project MALORCA MAchine Learning Of Speech Recognition Models for Controller Assistance), proposed a more effective adaptation solution integrating a machine learning Framework. As a first showcase, ABSR was automatically adapted with radar data and voice recordings for Prague and Vienna. The system reaches command recognition error rates of 0.6% (Prague) resp. 3.2% (Vienna). This paper describes the feedback trials with controllers from Vienna and Prague
Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas
Air Navigation Service Providers (ANSPs) replace paper flight strips through different digital solutions. The instructed com-mands from an air traffic controller (ATCos) are then available in computer readable form. However, those systems require manual controller inputs, i.e. ATCos workload increases. The Active Listening Assistant (AcListant®) project has shown that Assistant Based Speech Recognition (ABSR) is a potential solution to reduce this additional workload. However, the development of an ABSR application for a specific target-domain usually requires a large amount of manually transcribed audio data in order to achieve task-sufficient recognition accuracies. MALORCA project developed an initial basic ABSR system and semi-automatically tailored its recognition models for both Prague and Vienna approaches by machine learning from automatically transcribed audio data. Command recognition error rates were reduced from 7.9% to under 0.6% for Prague and from 18.9% to 3.2% for Vienna