579 research outputs found
Proceedings of the VIIth GSCP International Conference
The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
Design and Architecture of an Ontology-driven Dialogue System for HPV Vaccine Counseling
Speech and conversational technologies are increasingly being used by consumers, with the inevitability that one day they will be integrated in health care. Where this technology could be of service is in patient-provider communication, specifically for communicating the risks and benefits of vaccines. Human papillomavirus (HPV) vaccine, in particular, is a vaccine that inoculates individuals from certain HPV viruses responsible for adulthood cancers - cervical, head and neck cancers, etc. My research focuses on the architecture and development of speech-enabled conversational agent that relies on series of consumer-centric health ontologies and the technology that utilizes these ontologies. Ontologies are computable artifacts that encode and structure domain knowledge that can be utilized by machines to provide high level capabilities, such as reasoning and sharing information. I will focus the agentās impact on the HPV vaccine domain to observe if users would respond favorably towards conversational agents and the possible impact of the agent on their beliefs of the HPV vaccine. The approach of this study involves a multi-tier structure. The first tier is the domain knowledge base, the second is the application interaction design tier, and the third is the feasibility assessment of the participants. The research in this study proposes the following questions: Can ontologies support the system architecture for a spoken conversational agent for HPV vaccine counseling? How would prospective usersā perception towards an agent and towards the HPV vaccine be impacted after using conversational agent for HPV vaccine education?
The outcome of this study is a comprehensive assessment of a system architecture of a conversational agent for patient-centric HPV vaccine counseling. Each layer of the agent architecture is regulated through domain and application ontologies, and supported by the various ontology-driven software components that I developed to compose the agent architecture. Also discussed in this work, I present preliminary evidence of high usability of the agent and improvement of the usersā health beliefs toward the HPV vaccine. All in all, I introduce a comprehensive and feasible model for the design and development of an open-sourced, ontology-driven conversational agent for any health consumer domain, and corroborate the viability of a conversational agent as a health intervention tool
Advances in Human-Robot Interaction
Rapid advances in the field of robotics have made it possible to use robots not just in industrial automation but also in entertainment, rehabilitation, and home service. Since robots will likely affect many aspects of human existence, fundamental questions of human-robot interaction must be formulated and, if at all possible, resolved. Some of these questions are addressed in this collection of papers by leading HRI researchers
Recommended from our members
Cross-generational linguistic variation in the Canberra Vietnamese heritage language community: A corpus-centred investigation
This dissertation investigates cross-generational linguistic differences in the Canberra Vietnamese bilingual community, with a particular focus on Vietnamese as the heritage language. Specifically, it documents the vernacular and considers key aspects of this data from different theoretical perspectives. Its main contribution is an insight into a rarely studied heritage language variety in a contact community that has never been examined.
The dissertation consists of five core chapters, organised into two parts. In the first part (Chapters 2ā3), I describe how I documented the vernacular and created the Canberra Vietnamese English Corpus (CanVEC), an original corpus compiled specifically for this study that is also the first to be freely available for research purposes. The corpus consists of over ten hours of spontaneous speech produced by 45 Vietnamese-English bilingual speakers across two generations living in Canberra. In the second part of the study (Chapters 4ā6), I put the corpus to use and investigate aspects of the cross-generational differences in Vietnamese as the heritage language in this community.
In particular, I first probe the Vietnamese heritage language via its participation in the code-switching discourse (Chapter 4). In doing so, I focus on the applicability of the Matrix Language Framework (MLF) (Myers-Scotton, 1993, 2002) and its associated Matrix Language (ML) Turnover Hypothesis (Myers-Scotton, 1998) to the code-switching data in CanVEC. Since support for this prominent model has mainly come from language pairs that have different clausal word order or vastly different inventories of inflectional morphology, Vietnamese-English as a pair in which both languages are SVO and essentially isolating offers a tantalising testing ground for its application. Results show that the universal claims of this model do not hold so straight-forwardly. CanVEC data challenges several assumptions of the MLF, with the model ultimately only being able to account for around half of the CanVEC code-switching data. I further demonstrate that even when the ML is putatively identifiable and a cross-generational ML āturnoverā is quantitatively observed, the predictions do not reflect the direction of structural influence that we see in CanVEC. The MLF approach therefore sheds only limited light on cross-generational language shift and variation in this community.
Given that null elements emerge as a distinct area of difficulty in Chapter 4, I take this aspect as the focal point for the next part of the investigation (Chapter 5), where I use the variationist approach (Labov, 1972 et seq.) to explore three cases where null and overt realisation alternates in Vietnamese: subjects, objects, and copulas. In doing so, I move away from the bilingual portion of CanVEC to examine the monolingual heritage Vietnamese subset directly. Results show that Vietnamese null subjects vary significantly across generations, while null objects and copulas remain stable in terms of use. As speakers also overwhelmingly prefer overt forms over null forms (ā¼70:30) across all the three of the variables of interest, I appeal to the generative interface-oriented approach (Sorace & Filiaci, 2006 et seq.) to next examine the distribution of overt subjects, objects, copulas (Chapter 6). These results converge with what was found for null forms: cross-generational effects were observed for pronominal subjects, but not pronominal objects and copulas. This finding also supports the importance of a distinction drawn in previous works between internal (syntax-semantics) and external (syntax-discourse/pragmatics) interface phenomena, with the latter being seemingly more susceptible to change.
Ultimately, this dissertation highlights the empirical and theoretical value of studying rarely considered contact varieties, while deploying an integrated approach that acknowledges the multi-faceted complexity of the contact communities where these varieties are spoken.Cambridge Trust International Scholarshi
Developing Sparse Representations for Anchor-Based Voice Conversion
Voice conversion is the task of transforming speech from one speaker to sound as if it was produced by another speaker, changing the identity while retaining the linguistic content. There are many methods for performing voice conversion, but oftentimes these methods have onerous training requirements or fail in instances where one speaker has a nonnative accent. To address these issues, this dissertation presents and evaluates a novel āanchor-basedā representation of speech that separates speaker content from speaker identity by modeling how speakers form English phonemes.
We call the proposed method Sparse, Anchor-Based Representation of Speech (SABR), and explore methods for optimizing the parameters of this model in native-to-native and native-to-nonnative voice conversion contexts. We begin the dissertation by demonstrating how sparse coding in combination with a compact, phoneme-based dictionary can be used to separate speaker identity from content in objective and subjective tests. The formulation of the representation then presents several research questions. First, we propose a method for improving the synthesis quality by using the sparse coding residual in combination with a frequency warping algorithm to convert the residual from the source to target speakerās space, and add it to the target speakerās estimated spectrum. Experimentally, we find that synthesis quality is significantly improved via this transform. Second, we propose and evaluate two methods for selecting and optimizing SABR anchors in native-to-native and native-to-nonnative voice conversion. We find that synthesis quality is significantly improved by the proposed methods, especially in native-to- nonnative voice conversion over baseline algorithms. In a detailed analysis of the algorithms, we find they focus on phonemes that are difficult for nonnative speakers of English or naturally have multiple acoustic states. Following this, we examine methods for adding in temporal constraints to SABR via the Fused Lasso. The proposed method significantly reduces the inter-frame variance in the sparse codes over other methods that incorporate temporal features into sparse coding representations.
Finally, in a case study, we examine the use of the SABR methods and optimizations in the context of a computer aided pronunciation training system for building āGolden Speakersā, or ideal models for nonnative speakers of a second language to learn correct pronunciation. Under the hypothesis that the optimal āGolden Speakerā was the learnerās voice, synthesized with a native accent, we used SABR to build voice models for nonnative speakers and evaluated the resulting synthesis in terms of quality, identity, and accentedness. We found that even when deployed in the field, the SABR method generated synthesis with low accentedness and similar acoustic identity to the target speaker, validating the use of the method for building āgolden speakersā
Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change
It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time.
It is undeniable that, like human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time.
Social change in a society is triggered by various factors. In Indonesia, reform is one of the causes of change in various aspects of social life, including government, politics, economy and culture. All these changes are recorded by or reflected in language. 
Proceeding The 3rd International Seminar on Linguistics (ISOL-3): Language and Social Change
It is undeniable that, like a human being, language also changes. The lexicon once used in a language may no longer be used in the next few years. In contrast, a lexicon that did not exist before appeared and was widely used in the next period. The pronunciation of a word may change from time to time.
Social change in a society is triggered by various factors. In Indonesia, reform is one of the causes of change in various aspects of social life, including government, politics, economy, and culture. All these changes are recorded by or reflected in language
Language variation, automatic speech recognition and algorithmic bias
In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology
and language policy) and contemporary debates in AI ethics (especially regarding algorithmic
bias and fairness). In recent years, automatic speech recognition systems, alongside other
language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application
domains and language varieties can be understood as an expansion into new sociolinguistic
contexts. In this thesis, I am interested in how automatic speech recognition tools interact
with this sociolinguistic context, and how they affect speakers, speech communities and their
language varieties.
Focussing on commercial automatic speech recognition systems for British Englishes, I first
explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias
within the wider sociolinguistic context, it becomes apparent that these systems reproduce and
potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials
of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data
in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices,
such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and
the role of the developers making them better, I draw on theory from language policy research
and critical data studies. These conceptual frameworks are intended to help practitioners and
researchers in anticipating and mitigating predictive bias and other potential harms of speech
technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These
discourses put forward by researchers, developers and commercial providers not only have a
direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g.,
existing beliefs about language(s)) affects technology development. Finally, I explore ways of
building better automatic speech recognition tools, focussing in particular on well-documented,
naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily
a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential
harms of automatic speech recognition systems as embedded in larger algorithmic systems
and sociolinguistic contexts
- ā¦