1,612 research outputs found
Voice as a design material : sociophonetic inspired design strategies in Human-Computer Interaction
While there is a renewed interest in voice user interfaces (VUI) in HCI, little attention has been paid to the design of VUI voice output beyond intelligibility and naturalness. We draw on the field of sociophonetics - the study of the social factors that influence the production and perception of speech - to highlight how current VUIs are based on a limited and homogenised set of voice outputs. We argue that current systems do not adequately consider the diversity of peoples’ speech, how that diversity represents sociocultural identities, and how voices have the potential to shape user perceptions and experiences. Ultimately, as other technological developments have influenced the ideologies of language, the voice outputs of VUIs will influence the ideologies of speech. Based on our argument, we pose three design strategies for VUI voice output design - individualisation, context awareness, and diversification - to motivate new ways of conceptualising and designing these technologies
Stereotypical nationality representations in HRI: perspectives from international young adults
People often form immediate expectations about other people, or groups of people, based on visual appearance and characteristics of their voice and speech. These stereotypes, often inaccurate or overgeneralized, may translate to robots that carry human-like qualities. This study aims to explore if nationality-based preconceptions regarding appearance and accents can be found in people’s perception of a virtual and a physical social robot. In an online survey with 80 subjects evaluating different first-language-influenced accents of English and nationality-influenced human-like faces for a virtual robot, we find that accents, in particular, lead to preconceptions on perceived competence and likeability that correspond to previous findings in social science research. In a physical interaction study with 74 participants, we then studied if the perception of competence and likeability is similar after interacting with a robot portraying one of four different nationality representations from the online survey. We find that preconceptions on national stereotypes that appeared in the online survey vanish or are overshadowed by factors related to general interaction quality. We do, however, find some effects of the robot’s stereotypical alignment with the subject group, with Swedish subjects (the majority group in this study) rating the Swedish-accented robot as less competent than the international group, but, on the other hand, recalling more facts from the Swedish robot’s presentation than the international group does. In an extension in which the physical robot was replaced by a virtual robot interacting in the same scenario online, we further found the same results that preconceptions are of less importance after actual interactions, hence demonstrating that the differences in the ratings of the robot between the online survey and the interaction is not due to the interaction medium. We hence conclude that attitudes towards stereotypical national representations in HRI have a weak effect, at least for the user group included in this study (primarily educated young students in an international setting)
Language variation, automatic speech recognition and algorithmic bias
In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology
and language policy) and contemporary debates in AI ethics (especially regarding algorithmic
bias and fairness). In recent years, automatic speech recognition systems, alongside other
language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application
domains and language varieties can be understood as an expansion into new sociolinguistic
contexts. In this thesis, I am interested in how automatic speech recognition tools interact
with this sociolinguistic context, and how they affect speakers, speech communities and their
language varieties.
Focussing on commercial automatic speech recognition systems for British Englishes, I first
explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias
within the wider sociolinguistic context, it becomes apparent that these systems reproduce and
potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials
of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data
in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices,
such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and
the role of the developers making them better, I draw on theory from language policy research
and critical data studies. These conceptual frameworks are intended to help practitioners and
researchers in anticipating and mitigating predictive bias and other potential harms of speech
technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These
discourses put forward by researchers, developers and commercial providers not only have a
direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g.,
existing beliefs about language(s)) affects technology development. Finally, I explore ways of
building better automatic speech recognition tools, focussing in particular on well-documented,
naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily
a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential
harms of automatic speech recognition systems as embedded in larger algorithmic systems
and sociolinguistic contexts
Recommended from our members
The Challenge of Spoken Language Systems: Research Directions for the Nineties
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area
Recommended from our members
The Challenge of Spoken Language Systems: Research Directions for the Nineties
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area
User adaptations to system implementation in a mining company in Laos - A case study of organisational change
The purpose of this case study was to assess post–project implementation acceptance by users of new IS/IT systems in a mining company in Laos. The report investigated how the new system changed organisational working cultures and what avoidance or acceptance factors appeared. Also, it looked at how the new implemented systems contributed to the changes in business process and working procedures within Lane Xang Mineral Limited Company (LXML), which is a Lao subsidiary of a mining company from Australia.
The change implementation was a strategic business integration of MMG, a Chinese-owned global mining company, headquartered in Melbourne that operated several mining subsidiaries in Australia, Africa, Latin America, and in Laos. In 2013, LXML went through a big change implementation in terms of IS/IT systems consisting of the upgraded computing facilities, I.T. services outsourcing, communication systems, and the introduction of the new Enterprise Resource Planning (ERP) system. Those changes inevitably brought about change in the company’s business processes and working procedures. As a result, it shifted LXML’s way of working from the conventional paper-based system to a more systematic and electronic approach. Following the change, the organisation as well as its staff were faced with cultural issues and mismatch business processes.
To gain an understanding of the factors that impacted on the IS/IT implementation within Lane Xang Mineral Limited, this paper applied two analytical frameworks to the study of user acceptance and organisational cultural differences. Data gathering was conducted by an online survey and semi-structure online interviews with staff at different levels from within the organisation. The findings were then divided into enablers and barriers to user’s adaptation to the new systems implementation on individual and organisational level. The findings were also used to compare deductively with the analytical frameworks to verify their influencing categories.
This paper is organised in three main sections, the first section introduces the case background and description of the issues from the case study. The second section is a justification of the significance of issues identified, and of the selected conceptual frames that were applied in the study. The third section is the analysis section, which explains data collection methodologies and the analytical details. Findings on the study will also be found within this section.
At the end of the paper, the study is concluded by giving recommendations as a guide to I.T. Managers at the MMG headquarters in Australia and the LXML office in Laos, on transnational I.T. implementation within MMG. The recommendations could be taken as a guide for any other organisation (not only limited to the mining industry) to explore in order to plan for an effective I.T. implementation within their firms in the future
- …