13 research outputs found

    Inequity in Popular Voice Recognition Systems Regarding African Accents

    Get PDF
    With new age speakers such as the Echo Dot and Google Home, everyone should have equal opportunity to use them. Yet, for many popular voice recognition systems, the only accents that have wide support are those from Europe, Latin America, and Asia. This can be frustrating for users who have dialects or accents which are poorly understood by common tools like Amazon's Alexa. As such devices become more like household appliances, researchers are becoming increasingly aware of bias and inequity in Speech Recognition, as well as other sub-fields of Artificial Intelligence. The addition of African accents can potentially diversify smart speaker customer bases worldwide. My research project can help developers include accents from the African diaspora as they build these systems. In this work, we measure recognition accuracy for under-represented dialects across a variety of speech recognition systems and analyze the results in terms of standard performance metrics. After collecting audio files from different voices across the African diaspora, we discuss key findings and generate guidelines for developing an implementation for current voice recognition systems that are more fair for all

    Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

    Full text link
    This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T

    A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

    Full text link
    Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of 9.8\sim9.8K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them

    Performance Disparities Between Accents in Automatic Speech Recognition

    Full text link
    Automatic speech recognition (ASR) services are ubiquitous, transforming speech into text for systems like Amazon's Alexa, Google's Assistant, and Microsoft's Cortana. However, researchers have identified biases in ASR performance between particular English language accents by racial group and by nationality. In this paper, we expand this discussion both qualitatively by relating it to historical precedent and quantitatively through a large-scale audit. Standardization of language and the use of language to maintain global and political power have played an important role in history, which we explain to show the parallels in the ways in which ASR services act on English language speakers today. Then, using a large and global data set of speech from The Speech Accent Archive which includes over 2,700 speakers of English born in 171 different countries, we perform an international audit of some of the most popular English ASR services. We show that performance disparities exist as a function of whether or not a speaker's first language is English and, even when controlling for multiple linguistic covariates, that these disparities have a statistically significant relationship to the political alignment of the speaker's birth country with respect to the United States' geopolitical power

    Assessing whether artificial intelligence is an enabler or an inhibitor of sustainability at indicator level

    Get PDF
    "Since the early phase of the artificial-intelligence (AI) era expectations towards AI are high, with experts believing that AI paves the way for managing and handling various global challenges. However, the significant enabling and inhibiting influence of AI for sustainable development needs to be assessed carefully, given that the technology diffuses rapidly and affects millions of people worldwide on a day-to-day basis. To address this challenge, a panel discussion was organized by the KTH Royal Institute of Technology, the AI Sustainability Center and MIT Massachusetts Institute of Technology, gathering a wide range of AI experts. This paper summarizes the insights from the panel discussion around the following themes: The role of AI in achieving the Sustainable Development Goals (SDGs) AI for a prosperous 21st century Transparency, automated decision-making processes, and personal profiling and Measuring the relevance of Digitalization and Artificial Intelligence (D&AI) at the indicator level of SDGs. The research-backed panel discussion was dedicated to recognize and prioritize the agenda for addressing the pressing research gaps for academic research, funding bodies, professionals, as well as industry with an emphasis on the transportation sector. A common conclusion across these themes was the need to go beyond the development of AI in sectorial silos, so as to understand the impacts AI might have across societal, environmental, and economic outcomes. The recordings of the panel discussion can be found at: https://www.kth.se/en/2.18487/evenemang/the-role-of-ai-in-achieving-the-sdgs-enabler-or-inhibitor-1.1001364?date=2020â 08â 20&length=1&orglength=185&orgdate=2020â 06â 30 Short link: https://bit.ly/2Kap1tE © 2021"The authors acknowledge the KTH Sustainability Office and the KTH Digitalization Platform for their provided funding, which enabled the organization of this panel discussion. SG acknowledges the funding provided by the German Federal Ministry for Education and Research (BMBF) for the project “digitainable”. SDL acknowledges support through the Spanish Governmen

    Language variation, automatic speech recognition and algorithmic bias

    Get PDF
    In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology and language policy) and contemporary debates in AI ethics (especially regarding algorithmic bias and fairness). In recent years, automatic speech recognition systems, alongside other language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application domains and language varieties can be understood as an expansion into new sociolinguistic contexts. In this thesis, I am interested in how automatic speech recognition tools interact with this sociolinguistic context, and how they affect speakers, speech communities and their language varieties. Focussing on commercial automatic speech recognition systems for British Englishes, I first explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias within the wider sociolinguistic context, it becomes apparent that these systems reproduce and potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices, such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and the role of the developers making them better, I draw on theory from language policy research and critical data studies. These conceptual frameworks are intended to help practitioners and researchers in anticipating and mitigating predictive bias and other potential harms of speech technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These discourses put forward by researchers, developers and commercial providers not only have a direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g., existing beliefs about language(s)) affects technology development. Finally, I explore ways of building better automatic speech recognition tools, focussing in particular on well-documented, naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential harms of automatic speech recognition systems as embedded in larger algorithmic systems and sociolinguistic contexts

    Private and Reliable Neural Network Inference

    Full text link
    Reliable neural networks (NNs) provide important inference-time reliability guarantees such as fairness and robustness. Complementarily, privacy-preserving NN inference protects the privacy of client data. So far these two emerging areas have been largely disconnected, yet their combination will be increasingly important. In this work, we present the first system which enables privacy-preserving inference on reliable NNs. Our key idea is to design efficient fully homomorphic encryption (FHE) counterparts for the core algorithmic building blocks of randomized smoothing, a state-of-the-art technique for obtaining reliable models. The lack of required control flow in FHE makes this a demanding task, as na\"ive solutions lead to unacceptable runtime. We employ these building blocks to enable privacy-preserving NN inference with robustness and fairness guarantees in a system called Phoenix. Experimentally, we demonstrate that Phoenix achieves its goals without incurring prohibitive latencies. To our knowledge, this is the first work which bridges the areas of client data privacy and reliability guarantees for NNs.Comment: In ACM Conference on Computer and Communications Security (CCS 2022
    corecore