13 research outputs found
Inequity in Popular Voice Recognition Systems Regarding African Accents
With new age speakers such as the Echo Dot and Google Home, everyone should have equal opportunity to use them. Yet, for many popular voice recognition systems, the only accents that have wide support are those from Europe, Latin America, and Asia. This can be frustrating for users who have dialects or accents which are poorly understood by common tools like Amazon's Alexa. As such devices become more like household appliances, researchers are becoming increasingly aware of bias and inequity in Speech Recognition, as well as other sub-fields of Artificial Intelligence. The addition of African accents can potentially diversify smart speaker customer bases worldwide. My research project can help developers include accents from the African diaspora as they build these systems. In this work, we measure recognition accuracy for under-represented dialects across a variety of speech recognition systems and analyze the results in terms of standard performance metrics. After collecting audio files from different voices across the African diaspora, we discuss key findings and generate guidelines for developing an implementation for current voice recognition systems that are more fair for all
Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance
This paper analyzes the gender representation in four major corpora of French
broadcast. These corpora being widely used within the speech processing
community, they are a primary material for training automatic speech
recognition (ASR) systems. As gender bias has been highlighted in numerous
natural language processing (NLP) applications, we study the impact of the
gender imbalance in TV and radio broadcast on the performance of an ASR system.
This analysis shows that women are under-represented in our data in terms of
speakers and speech turns. We introduce the notion of speaker role to refine
our analysis and find that women are even fewer within the Anchor category
corresponding to prominent speakers. The disparity of available data for both
gender causes performance to decrease on women. However this global trend can
be counterbalanced for speaker who are used to speak in the media when
sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Automatic speech recognition (ASR) systems are designed to transcribe spoken
language into written text and find utility in a variety of applications
including voice assistants and transcription services. However, it has been
observed that state-of-the-art ASR systems which deliver impressive benchmark
results, struggle with speakers of certain regions or demographics due to
variation in their speech properties. In this work, we describe the curation of
a massive speech dataset of 8740 hours consisting of K technical
lectures in the English language along with their transcripts delivered by
instructors representing various parts of Indian demography. The dataset is
sourced from the very popular NPTEL MOOC platform. We use the curated dataset
to measure the existing disparity in YouTube Automatic Captions and OpenAI
Whisper model performance across the diverse demographic traits of speakers in
India. While there exists disparity due to gender, native region, age and
speech rate of speakers, disparity based on caste is non-existent. We also
observe statistically significant disparity across the disciplines of the
lectures. These results indicate the need of more inclusive and robust ASR
systems and more representational datasets for disparity evaluation in them
Performance Disparities Between Accents in Automatic Speech Recognition
Automatic speech recognition (ASR) services are ubiquitous, transforming
speech into text for systems like Amazon's Alexa, Google's Assistant, and
Microsoft's Cortana. However, researchers have identified biases in ASR
performance between particular English language accents by racial group and by
nationality. In this paper, we expand this discussion both qualitatively by
relating it to historical precedent and quantitatively through a large-scale
audit. Standardization of language and the use of language to maintain global
and political power have played an important role in history, which we explain
to show the parallels in the ways in which ASR services act on English language
speakers today. Then, using a large and global data set of speech from The
Speech Accent Archive which includes over 2,700 speakers of English born in 171
different countries, we perform an international audit of some of the most
popular English ASR services. We show that performance disparities exist as a
function of whether or not a speaker's first language is English and, even when
controlling for multiple linguistic covariates, that these disparities have a
statistically significant relationship to the political alignment of the
speaker's birth country with respect to the United States' geopolitical power
Assessing whether artificial intelligence is an enabler or an inhibitor of sustainability at indicator level
"Since the early phase of the artificial-intelligence (AI) era expectations towards AI are high, with experts believing that AI paves the way for managing and handling various global challenges. However, the significant enabling and inhibiting influence of AI for sustainable development needs to be assessed carefully, given that the technology diffuses rapidly and affects millions of people worldwide on a day-to-day basis. To address this challenge, a panel discussion was organized by the KTH Royal Institute of Technology, the AI Sustainability Center and MIT Massachusetts Institute of Technology, gathering a wide range of AI experts. This paper summarizes the insights from the panel discussion around the following themes: The role of AI in achieving the Sustainable Development Goals (SDGs) AI for a prosperous 21st century Transparency, automated decision-making processes, and personal profiling and Measuring the relevance of Digitalization and Artificial Intelligence (D&AI) at the indicator level of SDGs. The research-backed panel discussion was dedicated to recognize and prioritize the agenda for addressing the pressing research gaps for academic research, funding bodies, professionals, as well as industry with an emphasis on the transportation sector. A common conclusion across these themes was the need to go beyond the development of AI in sectorial silos, so as to understand the impacts AI might have across societal, environmental, and economic outcomes. The recordings of the panel discussion can be found at: https://www.kth.se/en/2.18487/evenemang/the-role-of-ai-in-achieving-the-sdgs-enabler-or-inhibitor-1.1001364?date=2020â 08â 20&length=1&orglength=185&orgdate=2020â 06â 30 Short link: https://bit.ly/2Kap1tE © 2021"The authors acknowledge the KTH Sustainability Office and the KTH Digitalization Platform for their provided funding, which enabled the organization of this panel discussion. SG acknowledges the funding provided by the German Federal Ministry for Education and Research (BMBF) for the project “digitainable”. SDL acknowledges support through the Spanish Governmen
Language variation, automatic speech recognition and algorithmic bias
In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology
and language policy) and contemporary debates in AI ethics (especially regarding algorithmic
bias and fairness). In recent years, automatic speech recognition systems, alongside other
language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application
domains and language varieties can be understood as an expansion into new sociolinguistic
contexts. In this thesis, I am interested in how automatic speech recognition tools interact
with this sociolinguistic context, and how they affect speakers, speech communities and their
language varieties.
Focussing on commercial automatic speech recognition systems for British Englishes, I first
explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias
within the wider sociolinguistic context, it becomes apparent that these systems reproduce and
potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials
of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data
in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices,
such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and
the role of the developers making them better, I draw on theory from language policy research
and critical data studies. These conceptual frameworks are intended to help practitioners and
researchers in anticipating and mitigating predictive bias and other potential harms of speech
technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These
discourses put forward by researchers, developers and commercial providers not only have a
direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g.,
existing beliefs about language(s)) affects technology development. Finally, I explore ways of
building better automatic speech recognition tools, focussing in particular on well-documented,
naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily
a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential
harms of automatic speech recognition systems as embedded in larger algorithmic systems
and sociolinguistic contexts
Private and Reliable Neural Network Inference
Reliable neural networks (NNs) provide important inference-time reliability
guarantees such as fairness and robustness. Complementarily, privacy-preserving
NN inference protects the privacy of client data. So far these two emerging
areas have been largely disconnected, yet their combination will be
increasingly important. In this work, we present the first system which enables
privacy-preserving inference on reliable NNs. Our key idea is to design
efficient fully homomorphic encryption (FHE) counterparts for the core
algorithmic building blocks of randomized smoothing, a state-of-the-art
technique for obtaining reliable models. The lack of required control flow in
FHE makes this a demanding task, as na\"ive solutions lead to unacceptable
runtime. We employ these building blocks to enable privacy-preserving NN
inference with robustness and fairness guarantees in a system called Phoenix.
Experimentally, we demonstrate that Phoenix achieves its goals without
incurring prohibitive latencies. To our knowledge, this is the first work which
bridges the areas of client data privacy and reliability guarantees for NNs.Comment: In ACM Conference on Computer and Communications Security (CCS 2022