4 research outputs found
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Voice communication between air traffic controllers (ATCos) and pilots is
critical for ensuring safe and efficient air traffic control (ATC). This task
requires high levels of awareness from ATCos and can be tedious and
error-prone. Recent attempts have been made to integrate artificial
intelligence (AI) into ATC in order to reduce the workload of ATCos. However,
the development of data-driven AI systems for ATC demands large-scale annotated
datasets, which are currently lacking in the field. This paper explores the
lessons learned from the ATCO2 project, a project that aimed to develop a
unique platform to collect and preprocess large amounts of ATC data from
airspace in real time. Audio and surveillance data were collected from publicly
accessible radio frequency channels with VHF receivers owned by a community of
volunteers and later uploaded to Opensky Network servers, which can be
considered an "unlimited source" of data. In addition, this paper reviews
previous work from ATCO2 partners, including (i) robust automatic speech
recognition, (ii) natural language processing, (iii) English language
identification of ATC communications, and (iv) the integration of surveillance
data such as ADS-B. We believe that the pipeline developed during the ATCO2
project, along with the open-sourcing of its data, will encourage research in
the ATC field. A sample of the ATCO2 corpus is available on the following
website: https://www.atco2.org/data, while the full corpus can be purchased
through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We
demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when
little or near to no ATC in-domain data is available. For instance, with the
CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9%
WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but
supervised CNN-TDNNf model.Comment: Manuscript under revie
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition
Automatic Speech Recognition (ASR) for air traffic
control is generally trained by pooling Air Traffic Controller
(ATCO) and pilot data. In practice, this is motivated by the
proportion of annotated data from pilots being less than ATCO’s.
However, due to the data imbalance of ATCO and pilot and
their varying acoustic conditions, the ASR performance is usually
significantly better for ATCOs speech than pilots. Obtaining the
speaker roles requires manual effort when the voice recordings
are collected using Very High Frequency (VHF) receivers and
the data is noisy and in a single channel without the push-totalk (PTT) signal. In this paper, we propose to (1) split the
ATCO and pilot data using an intuitive approach exploiting
ASR transcripts and (2) consider ATCO and pilot ASR as two
separate tasks for Acoustic Model (AM) training. The paper
focuses on applying this approach to noisy data collected using
VHF receivers, as this data is helpful for training despite its
noisy nature. We also developed a simple yet efficient knowledgebased system for speaker role classification based on grammar
defined by the International Civil Aviation Organization (ICAO).
Our system accepts as input text, thus, either gold annotations
or transcripts generated by an ABSR system. This approach
provides an average accuracy in speaker role identification of
83%. Finally, we show that training AMs separately for each
task, or using a multitask approach, is well suited for the noisy
data compared to the traditional ASR system, where all data is
pooled together for AM training