139 research outputs found

    AcListant with Continuous Learning: Speech Recognition in Air Traffic Control (EIWAC 2019)

    Get PDF
    Increasing air traffic creates many challenges for air traffic management (ATM). A general answer to these challenges is to increase automation. However, communication between air traffic controllers (ATCos) and pilots is widely analog and far away from digital ATM components. As communication content is important for the ATM system, commands are still entered manually by ATCos to enable the ATM system to consider the communication. However, the disadvantage is an additional workload for the ATCos. To avoid this additional effort, automatic speech recognition (ASR) can automatically analyze the communication and extract the content of spoken commands. DLR together with Saarland University invented the AcListant® system, the first assistant based speech recognition (ABSR) with both a high command recognition rate and a low command recognition error rate. Beside the high recognition performance, AcListant® project revealed shortcomings with respect to costly adaptations of the speech recognizer to different environments. Machine learning algorithms for the automatic adaptation of ABSR to different airports were developed to counteract this disadvantage within the Single European Sky ATM Research Programme (SESAR) 2020 Exploratory Research project MALORCA. To support the standardization of speech recognition in ATM, an ontology for ATC command recognition on semantic level was developed to enable the reuse of expensively manually transcribed ATC communication in the SESAR Industrial Research project PJ.16-04. Finally, results and experiences are used in two further SESAR Wave-2 projects. This paper presents the evolution of ABSR from AcListant® via MALORCA, PJ.16-04 to SESAR Wave-2 projects

    Early Callsign Highlighting using Automatic Speech Recognition to Reduce Air Traffic Controller Workload

    Get PDF
    The primary task of an air traffic controller (ATCo) is to issue instructions to pilots. However, the first contact is often initiated by the pilot. It is useful to have a controller assistance system, which could recognize and highlight the spoken callsign as early as possible, directly from the speech data. Therefore, we propose to use an automatic speech recognition (ASR) system to obtain the speech-to-text translation, using which we extract the spoken callsign. As a high callsign recognition performance is required, we use surveillance data, which significantly improves the performance. We obtain callsign recognition error rates of 6.2% and 8.3% for ATCo and pilot utterances, respectively, but can improve to 2.8% and 4.5%, when using information from surveillance dat

    Automatic Speech Analysis Framework for ATC Communication in HAAWAII

    Get PDF
    Over the past years, several SESAR funded exploratory projects focused on bringing speech and language technologies to the Air Traffic Management (ATM) domain and demonstrating their added value through successful applications. Recently ended HAAWAII project developed a generic architecture and framework, which was validated through several tasks such as callsign highlighting, pre-filling radar labels, and readback error detection. The primary goal was to support pilot and air traffic controller communication by deploying Automatic Speech Recognition (ASR) engines. Contextual information (if available) extracted from surveillance data, flight plan data, or previous communication can be exploited via entity boosting to further improve the recognition performance. HAAWAII proposed various design attributes to integrate the ASR engine into the ATM framework, often depending on concrete technical specifics of target air navigation service providers (ANSPs). This paper gives a brief overview and provides an objective assessment of speech processing components developed and integrated into the HAAWAII framework. Specifically, the following tasks are evaluated w.r.t. application domain: (i) speech activity detection, (ii) speaker segmentation and speaker role classification, as well as (iii) ASR. To our best knowledge, HAAWAII framework offers the best performing speech technologies for ATM, reaching high recognition accuracy (i.e., error-correction done by exploiting additional contextual data), robustness (i.e., models developed using large training corpora) and support for rapid domain transfer (i.e., to new ATM sector with minimum investment). Two scenarios provided by ANSPs were used for testing, achieving callsign detection accuracy of about 96% and 95% for NATS and ISAVIA, respectively

    Ensuring Safety for Artificial-Intelligence-Based Automatic Speech Recognition in Air Traffic Control Environment

    Get PDF
    This paper describes the safety assessment conducted in SESAR2020 project PJ.10-W2-96 ASR on automatic speech recognition (ASR) technology implemented for air traffic control (ATC) centers. ASR already now enables the automatic recognition of aircraft callsigns and various ATC commands including command types based on controller–pilot voice communications for presentation at the controller working position. The presented safety assessment process consists of defining design requirements for ASR technology application in normal, abnormal, and degraded modes of ATC operations. A total of eight functional hazards were identified based on the analysis of four use cases. The safety assessment was supported by top-down and bottom-up modelling and analysis of the causes of hazards to derive system design requirements for the purposes of mitigating the hazards. Assessment of achieving the specified design requirements was supported by evidence generated from two real-time simulations with pre-industrial ASR prototypes in approach and en-route operational environments. The simulations, focusing especially on the safety aspects of ASR application, also validated the hypotheses that ASR reduces controllers’ workload and increases situational awareness. The missing validation element, i.e., an analysis of the safety effects of ASR in ATC, is the focus of this paper. As a result of the safety assessment activities, mitigations were derived for each hazard, demonstrating that the use of ASR does not increase safety risks and is, therefore, ready for industrialization

    Customization of Automatic Speech Recognition Engines for Rare Word Detection Without Costly Model Re-Training

    Get PDF
    Thanks to Alexa, Siri or Google Assistant automatic speech recognition (ASR) has changed our daily life during the last decade. Prototypic applications in the air traffic management (ATM) domain are available. Recently pre-filling radar label entries by ASR support has reached the technology readiness level before industrialization (TRL6). However, seldom spoken and airspace related words relevant in the ATM context remain a challenge for sophisticated applications. Open-source ASR toolkits or large pre-trained models for experts - allowing to tailor ASR to new domains - can be exploited with a typical constraint on availability of certain amount of domain specific training data, i.e., typically transcribed speech for adapting acoustic and/or language models. In general, it is sufficient for a "universal" ASR engine to reliably recognize a few hundred words that form the vocabulary of the voice communications between air traffic controllers and pilots. However, for each airport some hundred dependent words that are seldom spoken need to be integrated. These challenging word entities comprise special airline designators and waypoint names like "dexon" or "burok", which only appear in a specific region. When used, they are highly informative and thus require high recognition accuracies. Allowing plug and play customization with a minimum expert manipulation assumes that no additional training is required, i.e., fine-tuning the universal ASR. This paper presents an innovative approach to automatically integrate new specific word entities to the universal ASR system. The recognition rate of these region-specific word entities with respect to the universal ASR increases by a factor of 6

    How to Measure Speech Recognition Performance in the Air Traffic Control Domain? The Word Error Rate is only half of the truth

    Get PDF
    Applying Automatic Speech Recognition (ASR) in the domain of analogue voice communication between air traffic controllers (ATCo) and pilots has more end user requirements than just transforming spoken words into text. It is useless, when word recognition is perfect, as long as the semantic interpretation is wrong. For an ATCo it is of no importance if the words of greeting are correctly recognized. A wrong recognition of a greeting should, however, not disturb the correct recognition of e.g. a “descend” command. Recently, 14 European partners from Air Traffic Management (ATM) domain have agreed on a common set of rules, i.e., an ontology on how to annotate the speech utterance of an ATCo. This paper first extends the ontology to pilot utterances and then compares different ASR implementations on semantic level by introducing command recognition, command recognition error, and command rejection rates. The implementation used in this paper achieves a command recognition rate better than 94% for Prague Approach, even when WER is above 2.5

    Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

    Get PDF
    Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data. In practice, this is motivated by the proportion of annotated data from pilots being less than ATCO’s. However, due to the data imbalance of ATCO and pilot and their varying acoustic conditions, the ASR performance is usually significantly better for ATCOs speech than pilots. Obtaining the speaker roles requires manual effort when the voice recordings are collected using Very High Frequency (VHF) receivers and the data is noisy and in a single channel without the push-totalk (PTT) signal. In this paper, we propose to (1) split the ATCO and pilot data using an intuitive approach exploiting ASR transcripts and (2) consider ATCO and pilot ASR as two separate tasks for Acoustic Model (AM) training. The paper focuses on applying this approach to noisy data collected using VHF receivers, as this data is helpful for training despite its noisy nature. We also developed a simple yet efficient knowledgebased system for speaker role classification based on grammar defined by the International Civil Aviation Organization (ICAO). Our system accepts as input text, thus, either gold annotations or transcripts generated by an ABSR system. This approach provides an average accuracy in speaker role identification of 83%. Finally, we show that training AMs separately for each task, or using a multitask approach, is well suited for the noisy data compared to the traditional ASR system, where all data is pooled together for AM training

    Safety Aspects of Supporting Apron Controllers with Automatic Speech Recognition and Understanding Integrated into an Advanced Surface Movement Guidance and Control System

    Get PDF
    The information air traffic controllers (ATCos) communicate via radio telephony is valuable for digital assistants to provide additional safety. Yet, ATCos have to enter this information manually. Assistant-based speech recognition (ABSR) has proven to be a lightweight technology that automatically extracts and successfully feeds the content of ATC communication into digital systems without additional human effort. This article explains how ABSR can be integrated into an advanced surface movement guidance and control system (A-SMGCS). The described validations were performed in the complex apron simulation training environment of Frankfurt Airport with 14 apron controllers in a human-in-the-loop simulation in summer 2022. The integration significantly reduces the workload of controllers and increases safety as well as overall performance. Based on a word error rate of 3.1%, the command recognition rate was 91.8% with a callsign recognition rate of 97.4%. This performance was enabled by the integration of A-SMGCS and ABSR: the command recognition rate improves by more than 15% absolute by considering A-SMGCS data in ABSR

    How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

    Full text link
    Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data substantially differs between the pre-training and downstream fine-tuning phases (i.e., domain shift). We target this scenario by analyzing the robustness of Wav2Vec2.0 and XLS-R models on downstream ASR for a completely unseen domain, i.e., air traffic control (ATC) communications. We benchmark the proposed models on four challenging ATC test sets (signal-to-noise ratio varies between 5 to 20 dB). Relative word error rate (WER) reduction between 20% to 40% are obtained in comparison to hybrid-based state-of-the-art ASR baselines by fine-tuning E2E acoustic models with a small fraction of labeled data. We also study the impact of fine-tuning data size on WERs, going from 5 minutes (few-shot) to 15 hours.Comment: This paper has been submitted to Interspeech 202
    corecore