Search CORE

1,706 research outputs found

Automatic Speech Recognition Within Air Traffic Control Domain

Author: Formoe Lars
Mygind Dan Bruun
Publication venue
Publication date: 01/01/2023
Field of study

Review of Automatic Speech Recognition Methodologies

Author: Achour Gabriel
Bouaynaya Nidhal
Carannante Giuseppina
Ditzler Gregory
Georgia Institute of Technology. Aerospace Systems Design Laboratory
Harrison Evan
Payan Alexia P
Rowan University
Sahbani Chiheb
Salunke Oojas
Publication venue: United States. Department of Transportation. Federal Aviation Administration. William J. Hughes Technical Center
Publication date
Field of study

DTFACT-14-D-00004,692M152240001This report highlights the crucial role of Automatic Speech Recognition (ASR) techniques in enhancing safety for air traffic control (ATC) in terminal environments. ASR techniques facilitate efficient and accurate transcription of verbal communications, reducing the likelihood of errors. The report also details the evolution of ASR technologies, converging to machine learning approaches from Hidden Markov Models (HMMs), Deep Neural Networks (DNNs) to End-to-End models. Finally, the report details the latest advancements in ASR techniques, focusing on transformer-based models that have outperformed traditional ASR approaches and achieved state-of-the-art results on ASR benchmarks

Rosa P: A digital library for transportation research

How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

Author: Helmke Hartmut
Kleinert Matthias
Motlicek Petr
Nigmatulina Iuliia
Ohneiser Oliver
Prasad Amrutha
Sarfjoo Saeed
Zhan Qingran
Zuluaga-Gomez Juan
Publication venue
Publication date: 31/03/2022
Field of study

Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data substantially differs between the pre-training and downstream fine-tuning phases (i.e., domain shift). We target this scenario by analyzing the robustness of Wav2Vec2.0 and XLS-R models on downstream ASR for a completely unseen domain, i.e., air traffic control (ATC) communications. We benchmark the proposed models on four challenging ATC test sets (signal-to-noise ratio varies between 5 to 20 dB). Relative word error rate (WER) reduction between 20% to 40% are obtained in comparison to hybrid-based state-of-the-art ASR baselines by fine-tuning E2E acoustic models with a small fraction of labeled data. We also study the impact of fine-tuning data size on WERs, going from 5 minutes (few-shot) to 15 hours.Comment: This paper has been submitted to Interspeech 202

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review

Author: Garcia-Constantino Matias
Hernández-Nolasco José Adán
Pancardo Pablo
Zacarias-Morales Noel
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Ulster University's Research Portal

Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

Author: Choukri Khalid
Khalil Driss
Lenders Vincent
Madikeri Srikanth
Motlicek Petr
Nigmatulina Iuliia
Prasad Amrutha
Rigault Mickael
Szoke Igor
Tart Allan
Zuluaga-Gomez Juan
Publication venue
Publication date: 01/05/2023
Field of study

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI systems for ATC demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, a project that aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time. Audio and surveillance data were collected from publicly accessible radio frequency channels with VHF receivers owned by a community of volunteers and later uploaded to Opensky Network servers, which can be considered an "unlimited source" of data. In addition, this paper reviews previous work from ATCO2 partners, including (i) robust automatic speech recognition, (ii) natural language processing, (iii) English language identification of ATC communications, and (iv) the integration of surveillance data such as ADS-B. We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field. A sample of the ATCO2 corpus is available on the following website: https://www.atco2.org/data, while the full corpus can be purchased through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when little or near to no ATC in-domain data is available. For instance, with the CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9% WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but supervised CNN-TDNNf model.Comment: Manuscript under revie

arXiv.org e-Print Archive

A MACHINE LEARNING FRAMEWORK FOR AUTOMATIC SPEECH RECOGNITION IN AIR TRAFFIC CONTROL USING WORD LEVEL BINARY CLASSIFICATION AND TRANSCRIPTION

Author: Sohail Fowad Shahid
Publication venue: Rowan Digital Works
Publication date: 23/09/2022
Field of study

Advances in Artificial Intelligence and Machine learning have enabled a variety of new technologies. One such technology is Automatic Speech Recognition (ASR), where a machine is given audio and transcribes the words that were spoken. ASR can be applied in a variety of domains to improve general usability and safety. One such domain is Air Traffic Control (ATC). ASR in ATC promises to improve safety in a mission critical environment. ASR models have historically required a large amount of clean training data. ATC environments are noisy and acquiring labeled data is a difficult, expertise dependent task. This thesis attempts to solve these problems by presenting a machine learning framework which uses word-by-word audio samples to transcribe ATC speech. Instead of transcribing an entire speech sample, this framework transcribes every word individually. Then, overall transcription is pieced together based on the word sequence. Each stage of the framework is trained and tested independently of one another, and the overall performance is gauged. The overall framework was gauged to be a feasible approach to ASR in ATC

Rowan University

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Author: Al-Maadeed Somaya
Amira Abbes
Bensaali Faycal
Himeur Yassine
Kheddar Hamza
Publication venue
Publication date: 27/04/2023
Field of study

Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

arXiv.org e-Print Archive

Counter-Terrorism in New Europe

Author: Spencer Alexander
Publication venue
Publication date: 01/01/2006
Field of study

In recent years the nature of terrorism has changed dramatically and has taken on a new combination of characteristics. The fight against this terrorism has become a global concern and central issue of international government policies. Counter-terrorism policies have transformed all around the world, and the importance states place on certain aspects of their counter-terrorist measures vary considerably. There is no agreement on how best to fight terrorism. Within the European Union (EU) this disagreement is the most visible, with some countries supporting the United States in their military fight against terrorism, while other strongly oppose it. This paper will focus on five of the ten new EU members that joined in 2004 (Estonia, Poland, the Czech Republic, Slovenia and Malta) and review some of their existing counter-terrorism measures. In doing so the paper will examine the strengths and weaknesses of each individual state’s policy and highlight some of the general trends and patterns among them

Open Access LMU

Automatic speech recognition for European Portuguese

Author: Campinho Adriano Vaz de Carvalho
Publication venue
Publication date: 01/01/2021
Field of study

Dissertação de mestrado em Informatics EngineeringThe process of Automatic Speech Recognition (ASR) opens doors to a vast amount of possible improvements in customer experience. The use of this type of technology has increased significantly in recent years, this change being the result of the recent evolution in ASR systems. The opportunities to use ASR are vast, covering several areas, such as medical, industrial, business, among others. We must emphasize the use of these voice recognition systems in telecommunications companies, namely, in the automation of consumer assistance operators, allowing the service to be routed to specialized operators automatically through the detection of matters to be dealt with through recognition of the spoken utterances. In recent years, we have seen big technological breakthrough in ASR, achieving unprecedented accuracy results that are comparable to humans. We are also seeing a move from what is known as the Traditional approach of ASR systems, based on Hidden Markov Models (HMM), to the newer End-to-End ASR systems that obtain benefits from the use of deep neural networks (DNNs), large amounts of data and process parallelization. The literature review showed us that the focus of this previous work was almost exclusively for the English and Chinese languages, with little effort being made in the development of other languages, as it is the case with Portuguese. In the research carried out, we did not find a model for the European Portuguese (EP) dialect that is freely available for general use. Focused on this problem, this work describes the development of a End-to-End ASR system for EP. To achieve this goal, a set of procedures was followed that allowed us to present the concepts, characteristics and all the steps inherent to the construction of these types of systems. Furthermore, since the transcribed speech needed to accomplish our goal is very limited for EP, we also describe the process of collecting and formatting data from a variety of different sources, most of them freely available to the public. To further try and improve our results, a variety of different data augmentation techniques were implemented and tested. The obtained models are based on a PyTorch implementation of the Deep Speech 2 model. Our best model achieved an Word Error Rate (WER) of 40.5%, in our main test corpus, achieving slightly better results to those obtained by commercial systems on the same data. Around 150 hours of transcribed EP was collected, so that it can be used to train other ASR systems or models in different areas of investigation. We gathered a series of interesting results on the use of different batch size values as well as the improvements provided by the use of a large variety of data augmentation techniques. Nevertheless, the ASR theme is vast and there is still a variety of different methods and interesting concepts that we could research in order to seek an improvement of the achieved results.O processo de Reconhecimento Automático de Fala (ASR) abre portas para uma grande quantidade de melhorias possíveis na experiência do cliente. A utilização deste tipo de tecnologia tem aumentado significativamente nos últimos anos, sendo esta alteração o resultado da evolução recente dos sistemas ASR. As oportunidades de utilização do ASR são vastas, abrangendo diversas áreas, como médica, industrial, empresarial, entre outras. É de realçar que a utilização destes sistemas de reconhecimento de voz nas empresas de telecomunicações, nomeadamente, na automatização dos operadores de atendimento ao consumidor, permite o encaminhamento automático do serviço para operadores especializados através da detecção de assuntos a tratar através do reconhecimento de voz. Nos últimos anos, vimos um grande avanço tecnológico em ASR, alcançando resultados de precisão sem precedentes que são comparáveis aos atingidos por humanos. Por outro lado, vemos também uma mudança do que é conhecido como a abordagem tradicional, baseados em modelos de Markov ocultos (HMM), para sistemas mais recentes ponta-a-ponta que reúnem benefícios do uso de redes neurais profundas, em grandes quantidades de dados e da paralelização de processos. A revisão da literatura efetuada mostra que o foco do trabalho anterior foi quase que exclusivamente para as línguas inglesa e chinesa, com pouco esforço no desenvolvimento de outras línguas, como é o caso do português. Na pesquisa realizada, não encontramos um modelo para o dialeto português europeu (PE) que se encontre disponível gratuitamente para uso geral. Focado neste problema, este trabalho descreve o desenvolvimento de um sistema de ASR ponta-a-ponta para o PE. Para atingir este objetivo, foi seguido um conjunto de procedimentos que nos permitiram apresentar os conceitos, características e todas as etapas inerentes à construção destes tipos de sistemas. Além disso, como a fala transcrita necessária para cumprir o nosso objetivo é muito limitada para PE, também descrevemos o processo de coleta e formatação desses dados em uma variedade de fontes diferentes, a maioria delas disponíveis gratuitamente ao público. Para tentar melhorar os nossos resultados, uma variedade de diferentes técnicas de aumento de dados foram implementadas e testadas. Os modelos obtidos são baseados numa implementação PyTorch do modelo Deep Speech 2. O nosso melhor modelo obteve uma taxa de erro de palavras (WER) de 40,5% no nosso corpus de teste principal, obtendo resultados ligeiramente melhores do que aqueles obtidos por sistemas comerciais sobre os mesmos dados. Foram coletadas cerca de 150 horas de PE transcritas, que podem ser utilizadas para treinar outros sistemas ou modelos de ASR em diferentes áreas de investigação. Reunimos uma série de resultados interessantes sobre o uso de diferentes valores de batch size, bem como as melhorias fornecidas pelo uso de uma grande variedade de técnicas de data augmentation. O tema ASR é vasto e ainda existe uma grande variedade de métodos diferentes e conceitos interessantes que podemos investigar para melhorar os resultados alcançados

Universidade do Minho: RepositoriUM

Smart Home Personal Assistants: A Security and Privacy Review

Author: Edu Jide S.
Suarez-Tangil Guillermo
Such Jose M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/08/2020
Field of study

Smart Home Personal Assistants (SPA) are an emerging innovation that is changing the way in which home users interact with the technology. However, there are a number of elements that expose these systems to various risks: i) the open nature of the voice channel they use, ii) the complexity of their architecture, iii) the AI features they rely on, and iv) their use of a wide-range of underlying technologies. This paper presents an in-depth review of the security and privacy issues in SPA, categorizing the most important attack vectors and their countermeasures. Based on this, we discuss open research challenges that can help steer the community to tackle and address current security and privacy issues in SPA. One of our key findings is that even though the attack surface of SPA is conspicuously broad and there has been a significant amount of recent research efforts in this area, research has so far focused on a small part of the attack surface, particularly on issues related to the interaction between the user and the SPA devices. We also point out that further research is needed to tackle issues related to authorization, speech recognition or profiling, to name a few. To the best of our knowledge, this is the first article to conduct such a comprehensive review and characterization of the security and privacy issues and countermeasures of SPA.Comment: Accepted for publication in ACM Computing Survey

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository