29 research outputs found
Nuclear Confrontations
The effect of nuclear weapons has long been debated. Some argue that these weapons have a stabilizing effect on already volatile regions and rivals, while others fear that it will only further escalate tensions. This undergraduate thesis studies how relations between countries have changed once a country has attained nuclear weapons. Specifically, whether Militarized Interstate Disputes (MIDs) have increased or decreased before and after the acquisition of nuclear weapons. In order to do so, I look at the severity and occurrence of MIDs to see trends in changes of state attitudes because of nuclear weapons, through the lenses of the three most popular nuclear schools of thought: deterrence, stability/instability paradox and irrelevance.Ope
South Asia: After the Bomb
The effect of nuclear weapons has long been debated. Some argue that these weapons have a stabilizing effect on already volatile regions and rivals, while others fear that it will only further escalate tensions. In their book debating a nuclear world, Waltz and Sagan take opposing positions on the effect that the atom bomb had on the territorial issues India has with its neighbors Pakistan and China. The case of South Asia and the bomb is unique in the sense that while they are all ancient civilizations, their current regimes are all the same age. We are thus offered a cocktail of ancient civilizations, young regimes, territorial conflict, and history’s most lethal weapon. This paper seeks to discover if the presence of nuclear bombs has impacted the ability to resolve territorial disputes between these nations. After looking at the foreign policy of each country and the history of their development of nuclear weapons, I find that, yes, possessing nuclear weapons has delayed the resolution of this territorial issue.Ope
On the Existence of Elementwise Invariant Vectors in Representations of Symmetric Groups
We determine when a permutation with cycle type admits a non-zero
invariant vector in the irreducible representation of the symmetric
group. We find that a majority of pairs have this property,
with only a few simple exceptions.Comment: 16 pages, 11 figure
Automatic Speech Analysis Framework for ATC Communication in HAAWAII
Over the past years, several SESAR funded exploratory projects focused on bringing speech and language technologies to the Air Traffic Management (ATM) domain and demonstrating their added value through successful applications. Recently ended HAAWAII project developed a generic architecture and framework, which was validated through several tasks such as callsign highlighting, pre-filling radar labels, and readback error detection. The primary goal was to support pilot and air traffic controller communication by deploying Automatic Speech Recognition (ASR) engines. Contextual information (if available) extracted from surveillance data, flight plan data, or previous communication can be exploited via entity boosting to further improve the recognition performance. HAAWAII proposed various design attributes to integrate the ASR engine into the ATM framework, often depending on concrete technical specifics of target air navigation service providers (ANSPs). This paper gives a brief overview and provides an objective assessment of speech processing components developed and integrated into the HAAWAII framework. Specifically, the following tasks are evaluated w.r.t. application domain: (i) speech activity detection, (ii) speaker segmentation and speaker role classification, as well as (iii) ASR. To our best knowledge, HAAWAII framework offers the best performing speech technologies for ATM, reaching high recognition accuracy (i.e., error-correction done by exploiting additional contextual data), robustness (i.e., models developed using large training corpora) and support for rapid domain transfer (i.e., to new ATM sector with minimum investment). Two scenarios provided by ANSPs were used for testing, achieving callsign detection accuracy of about 96% and 95% for NATS and ISAVIA, respectively
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Voice communication between air traffic controllers (ATCos) and pilots is
critical for ensuring safe and efficient air traffic control (ATC). This task
requires high levels of awareness from ATCos and can be tedious and
error-prone. Recent attempts have been made to integrate artificial
intelligence (AI) into ATC in order to reduce the workload of ATCos. However,
the development of data-driven AI systems for ATC demands large-scale annotated
datasets, which are currently lacking in the field. This paper explores the
lessons learned from the ATCO2 project, a project that aimed to develop a
unique platform to collect and preprocess large amounts of ATC data from
airspace in real time. Audio and surveillance data were collected from publicly
accessible radio frequency channels with VHF receivers owned by a community of
volunteers and later uploaded to Opensky Network servers, which can be
considered an "unlimited source" of data. In addition, this paper reviews
previous work from ATCO2 partners, including (i) robust automatic speech
recognition, (ii) natural language processing, (iii) English language
identification of ATC communications, and (iv) the integration of surveillance
data such as ADS-B. We believe that the pipeline developed during the ATCO2
project, along with the open-sourcing of its data, will encourage research in
the ATC field. A sample of the ATCO2 corpus is available on the following
website: https://www.atco2.org/data, while the full corpus can be purchased
through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We
demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when
little or near to no ATC in-domain data is available. For instance, with the
CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9%
WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but
supervised CNN-TDNNf model.Comment: Manuscript under revie
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition
Automatic Speech Recognition (ASR) for air traffic
control is generally trained by pooling Air Traffic Controller
(ATCO) and pilot data. In practice, this is motivated by the
proportion of annotated data from pilots being less than ATCO’s.
However, due to the data imbalance of ATCO and pilot and
their varying acoustic conditions, the ASR performance is usually
significantly better for ATCOs speech than pilots. Obtaining the
speaker roles requires manual effort when the voice recordings
are collected using Very High Frequency (VHF) receivers and
the data is noisy and in a single channel without the push-totalk (PTT) signal. In this paper, we propose to (1) split the
ATCO and pilot data using an intuitive approach exploiting
ASR transcripts and (2) consider ATCO and pilot ASR as two
separate tasks for Acoustic Model (AM) training. The paper
focuses on applying this approach to noisy data collected using
VHF receivers, as this data is helpful for training despite its
noisy nature. We also developed a simple yet efficient knowledgebased system for speaker role classification based on grammar
defined by the International Civil Aviation Organization (ICAO).
Our system accepts as input text, thus, either gold annotations
or transcripts generated by an ABSR system. This approach
provides an average accuracy in speaker role identification of
83%. Finally, we show that training AMs separately for each
task, or using a multitask approach, is well suited for the noisy
data compared to the traditional ASR system, where all data is
pooled together for AM training
How to Measure Speech Recognition Performance in the Air Traffic Control Domain? The Word Error Rate is only half of the truth
Applying Automatic Speech Recognition (ASR) in the domain
of analogue voice communication between air traffic
controllers (ATCo) and pilots has more end user requirements
than just transforming spoken words into text. It is useless,
when word recognition is perfect, as long as the semantic
interpretation is wrong. For an ATCo it is of no importance if
the words of greeting are correctly recognized. A wrong
recognition of a greeting should, however, not disturb the
correct recognition of e.g. a “descend” command. Recently, 14
European partners from Air Traffic Management (ATM)
domain have agreed on a common set of rules, i.e., an ontology
on how to annotate the speech utterance of an ATCo. This paper
first extends the ontology to pilot utterances and then compares
different ASR implementations on semantic level by
introducing command recognition, command recognition error,
and command rejection rates. The implementation used in this
paper achieves a command recognition rate better than 94% for
Prague Approach, even when WER is above 2.5
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
Personal assistants, automatic speech recognizers and dialogue understanding
systems are becoming more critical in our interconnected digital world. A clear
example is air traffic control (ATC) communications. ATC aims at guiding
aircraft and controlling the airspace in a safe and optimal manner. These
voice-based dialogues are carried between an air traffic controller (ATCO) and
pilots via very-high frequency radio channels. In order to incorporate these
novel technologies into ATC (low-resource domain), large-scale annotated
datasets are required to develop the data-driven AI systems. Two examples are
automatic speech recognition (ASR) and natural language understanding (NLU). In
this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering
research on the challenging ATC field, which has lagged behind due to lack of
annotated data. The ATCO2 corpus covers 1) data collection and pre-processing,
2) pseudo-annotations of speech data, and 3) extraction of ATC-related named
entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set
corpus contains 4 hours of ATC speech with manual transcripts and a subset with
gold annotations for named-entity recognition (callsign, command, value). 2)
The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched
with automatic transcripts from an in-domain speech recognizer, contextual
information, speaker turn information, signal-to-noise ratio estimate and
English language detection score per sample. Both available for purchase
through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3)
The ATCO2-test-set-1h corpus is a one-hour subset from the original test set
corpus, that we are offering for free at https://www.atco2.org/data. We expect
the ATCO2 corpus will foster research on robust ASR and NLU not only in the
field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at
https://github.com/idiap/atco2-corpu
How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Recent work on self-supervised pre-training focus on leveraging large-scale
unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM)
that can be later fine-tuned on downstream tasks e.g., automatic speech
recognition (ASR). Yet, few works investigated the impact on performance when
the data substantially differs between the pre-training and downstream
fine-tuning phases (i.e., domain shift). We target this scenario by analyzing
the robustness of Wav2Vec2.0 and XLS-R models on downstream ASR for a
completely unseen domain, i.e., air traffic control (ATC) communications. We
benchmark the proposed models on four challenging ATC test sets
(signal-to-noise ratio varies between 5 to 20 dB). Relative word error rate
(WER) reduction between 20% to 40% are obtained in comparison to hybrid-based
state-of-the-art ASR baselines by fine-tuning E2E acoustic models with a small
fraction of labeled data. We also study the impact of fine-tuning data size on
WERs, going from 5 minutes (few-shot) to 15 hours.Comment: This paper has been submitted to Interspeech 202