Search CORE

8 research outputs found

How to Measure Speech Recognition Performance in the Air Traffic Control Domain? The Word Error Rate is only half of the truth

Author: Cerna Aneta
Helmke Hartmut
Kleinert Matthias
Motlice Petr
Ohneiser Oliver
Prasad Amrutha
Shetty Shruthi
Windisch Christian
Publication venue
Publication date: 01/01/2021
Field of study

Applying Automatic Speech Recognition (ASR) in the domain of analogue voice communication between air traffic controllers (ATCo) and pilots has more end user requirements than just transforming spoken words into text. It is useless, when word recognition is perfect, as long as the semantic interpretation is wrong. For an ATCo it is of no importance if the words of greeting are correctly recognized. A wrong recognition of a greeting should, however, not disturb the correct recognition of e.g. a “descend” command. Recently, 14 European partners from Air Traffic Management (ATM) domain have agreed on a common set of rules, i.e., an ontology on how to annotate the speech utterance of an ATCo. This paper first extends the ontology to pilot utterances and then compares different ASR implementations on semantic level by introducing command recognition, command recognition error, and command rejection rates. The implementation used in this paper achieves a command recognition rate better than 94% for Prague Approach, even when WER is above 2.5

Institute of Transport Research:Publications

Measuring Speech Recognition And Understanding Performance in Air Traffic Control Domain Beyond Word Error Rates

Author: Cerna Aneta
Helmke Hartmut
Kleinert Matthias
Motlicek Petr
Ohneiser Oliver
Prasad Amrutha
Shetty Shruthi
Windisch Christian
Publication venue
Publication date: 01/01/2021
Field of study

Applying Automatic Speech Recognition (ASR) in the domain of analogue voice communication between air traffic controllers (ATCo) and pilots has more end user requirements than just transforming spoken words into text. It is useless for, e.g., read-back error detection support, if word recognition is perfect, as long as the semantic interpretation is wrong. For an ATCo it is of almost no importance if the words of a greeting are correctly recognized. A wrong recognition of a greeting should, however, not disturb the correct recognition of, e.g., a “descend” com-mand. More important is the correct semantic interpretation. What, however, is the correct semantic interpretation especially when ATCos or pilot, deviate more of less from published standard phraseology? For comparing performance of different speech recognition applications, 14 European partners from Air Traffic Management (ATM) domain have recently agreed on a common set of rules, i.e., an ontology on how to annotate the speech utterances of an ATCo on semantic level. This paper first presents the new metric of “unclassified word rate”, extends the ontology to pilot utterances, and introduces the metrics of com-mand recognition rate, command recognition error rate, and command recognition rejection rate. This enables the compari-son of different speech recognition and understanding instances on semantic level. The implementation used in this paper achieves a command recognition rate better than 96% for Pra-gue Approach, even if word error rate is above 2.5% based on more than 12,000 ATCo commands – recorded in both opera-tional and lab environment. This outperforms previous pub-lished rates by 2% absolute

Institute of Transport Research:Publications

Ontology for Transcription of ATC Speech Commands of SESAR 2020 Solution PJ.16-04

Author: Cerna Aneta
et al. et al
Ferrer Herrer Damián
Helmke Hartmut
Ohneiser Oliver
Poiger Michael
Slotty Michael
Vink Nathan
Publication venue
Publication date: 01/09/2018
Field of study

Nowadays Automatic Speech Recognition (ASR) applications are increasingly successful in the air traffic (ATC) domain. Paramount to achieving this is collecting enough data for speech recognition model training. Thousands of hours of ATC communication are recorded every day. However, the transcription of these data sets is resource intense, i.e. writing down the sequence of spoken words, and more importantly, interpreting the relevant semantics. Many different approaches including CPDLC (Controller Pilot Data Link Communications) currently exist in the ATC community for command transcription, a fact that e.g. complicates exchange of transcriptions. The partners of the SESAR funded solution PJ.16-04 are currently developing on a common ontology for transcription of controller-pilot communications, which will harmonize integration of ASR into controller working positions. The resulting ontology is presented in this paper

Institute of Transport Research:Publications

Adaptation of Assistant Based Speech Recognition to New Domains and its Acceptance by Air Traffic Controllers

Author: Cerna Aneta
Ehr Heiko
Helmke Hartmut
Hlousek Petr
Kern Christian
Klakow Dietrich
Kleinert Matthias
Motlice Petr
Singh Mittul
Siol Gerald
Publication venue
Publication date: 01/01/2019
Field of study

In air traffic control rooms, paper flight strips are more and more replaced by digital solutions. The digital systems, however, increase the workload for air traffic controllers: For instance, each voice-command must be manually inserted into the system by the controller. Recently the AcListant® project has validated that Assistant Based Speech Recognition (ABSR) can replace the manual inputs by automatically recognized voice commands. Adaptation of ABSR to different environments, however, has shown to be expensive. The Horizon 2020 funded project MALORCA MAchine Learning Of Speech Recognition Models for Controller Assistance), proposed a more effective adaptation solution integrating a machine learning Framework. As a first showcase, ABSR was automatically adapted with radar data and voice recordings for Prague and Vienna. The system reaches command recognition error rates of 0.6% (Prague) resp. 3.2% (Vienna). This paper describes the feedback trials with controllers from Vienna and Prague

Institute of Transport Research:Publications

Crossref

Adaptation of Assistant Based Speech Recognition to New Domains and Its Acceptance by Air Traffic Controllers

Author: Aneta Cerna
Christian Kern
Ehr heiko
Helmke Hartmut
Klakow Dietrich
Kleinert Matthias
Motlicek Petr
Petr Hlousek
Singh Mittul
Siol Gerald
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/01/2019
Field of study

In air traffic control rooms, paper flight strips are more and more replaced by digital solutions. The digital systems, however, increase the workload for air traffic controllers: For instance, each voice-command must be manually inserted into the system by the controller. Recently the AcListant® project has validated that Assistant Based Speech Recognition (ABSR) can replace the manual inputs by automatically recognized voice commands. Adaptation of ABSR to different environments, however, has shown to be expensive. The Horizon 2020 funded project MALORCA (MAchine Learning Of Speech Recognition Models for Controller Assistance), proposed a more effective adaptation solution integrating a machine learning framework. As a first showcase, ABSR was automatically adapted with radar data and voice recordings for Prague and Vienna. The system reaches command recognition error rates of 0.6% (Prague) resp. 3.2% (Vienna). This paper describes the feedback trials with controllers from Vienna and Prague

Infoscience - École polytechnique fédérale de Lausanne

Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas

Author: Cerna Aneta
Ehr Heiko
Helmke Hartmut
Kern Christian
Klakow Dietrich
Kleinert Matthias
Motlice Petr
Oualil Youssef
Singh Mittul
Siol Gerald
Srinivasamurthy Ajay
Publication venue
Publication date: 01/09/2018
Field of study

Air Navigation Service Providers (ANSPs) replace paper flight strips through different digital solutions. The instructed com-mands from an air traffic controller (ATCos) are then available in computer readable form. However, those systems require manual controller inputs, i.e. ATCos workload increases. The Active Listening Assistant (AcListant®) project has shown that Assistant Based Speech Recognition (ABSR) is a potential solution to reduce this additional workload. However, the development of an ABSR application for a specific target-domain usually requires a large amount of manually transcribed audio data in order to achieve task-sufficient recognition accuracies. MALORCA project developed an initial basic ABSR system and semi-automatically tailored its recognition models for both Prague and Vienna approaches by machine learning from automatically transcribed audio data. Command recognition error rates were reduced from 7.9% to under 0.6% for Prague and from 18.9% to 3.2% for Vienna

Institute of Transport Research:Publications

Crossref

Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas

Author: Aneta Cerna
Christian Kern
Ehr heiko
Helmke Hartmut
Klakow Dietrich
Kleinert Matthias
Motlicek Petr
Oualil Youssef
Singh Mittul
Siol Gerald
Srinivasamurthy Ajay
Publication venue
Publication date: 06/02/2019
Field of study

Air Navigation Service Provider (ANSPs) replace paper flight strips through different digital solutions. The instructed commands from an air traffic controller (ATCOs) are then available in computer readable form. However, those systems require manual controller inputs, i.e. ATCOs’ workload increases. The Active Listening Assistant (AcListant®) project has shown that Assistant Based Speech Recognition (ABSR) is a potential solution to reduce this additional workload. However, the development of an ABSR application for a specific target-domain usually requires a large amount of manually transcribed audio data in order to achieve task- sufficient recognition accuracies. MALORCA project developed an initial basic ABSR system and semi-automatically tailored its recognition models for both Prague and Vienna approach by machine learning from automatically transcribed audio data. Command recognition error rates were reduced from 7.9% to under 0.6% for Prague and from 18.9% to 3.2% for Vienna

Infoscience - École polytechnique fédérale de Lausanne

Ontology for Transcription of ATC Speech Commands of SESAR 2020 Solution PJ.16-04

Author: Cerna Aneta
Hartikainen Petri
Helmke Hartmut
Herrer Damian
Josefsson Billy
Langr David
Lasheras Raquel
Marin Gabriela
Mevatne Odd
Moos Sylvain
Nilsson Mats
Ohneiser Oliver
Perez Mario
Poiger Michael
Slotty Michael
Vink Nathan
Publication venue
Publication date: 11/12/2018
Field of study

Crossref

Scipedia