Search CORE

15 research outputs found

Automatic transcription and (de)standardisation

Author: Bailey Gavin
Bell Peter
Jones Matt
Klejch Ondrej
Markl Nina
Pearson Jennifer
Reitmaier Thomas
Robinson Simon
Wallington Electra
Publication venue
Publication date: 18/08/2023
Field of study

Edinburgh Research Explorer

Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers

Author: Bell Peter
Jones Matt
Kalarikalayil Raju Dani
Klejch Ondrej
Pearson Jennifer
Reitmaier Thomas
Robinson Simon
Wallington Electra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2022
Field of study

Edinburgh Research Explorer

Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations

Author: Abidi K
Fohr Dominique
González-Gallardo C,
Jouvet Denis
Langlois D
Mella Odile
Menacer M,
Sadat F
Smaïli Kamel
Torres-Moreno J,
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/10/2019
Field of study

International audienceIn this paper, we present and evaluate a method for extractive text-based summarization of Arabic videos. The algorithm is proposed in the scope of the AMIS project that aims at helping a user to understand videos given in a foreign language (Arabic). For that, the project proposes several strategies to translate and summarize the videos. One of them consists in transcribing the Ara-bic videos, summarizing the transcriptions, and translating the summary. In this paper we describe the video corpus that was collected from YouTube and present and evaluate the transcription-summarization part of this strategy. Moreover, we present the Automatic Speech Recognition (ASR) system used to transcribe the videos, and show how we adapted this system to the Algerian dialect. Then, we describe how we automatically segment into sentences the sequence of words provided by the ASR system, and how we summarize the obtained sequence of sentences. We evaluate objectively and subjectively our approach. Results show that the ASR system performs well in terms of Word Error Rate on MSA, but needs to be adapted for dealing with Algerian dialect data. The subjective evaluation shows the same behaviour than ASR: transcriptions for videos containing dialectal data were better scored than videos containing only MSA data. However, summaries based on transcriptions are not as well rated, even when transcriptions are better rated. Last, the study shows that features, such as the lengths of transcriptions and summaries, and the subjective score of transcriptions, explain only 31% of the subjective score of summaries

Crossref

INRIA a CCSD electronic archive server

A comparison of mobile search interfaces for isiXhosa speakers

Author: Modise Morebodi
Publication venue: Department of Computer Science
Publication date: 25/02/2019
Field of study

Search interfaces have for a long time been targeted at the resource-rich languages, such as English. There has been little effort to support African (Bantu) languages in search interfaces when compared to languages such as English, particularly the isiXhosa language. However, due to the increase in use of mobile phones in developing countries, these interfaces can now be adapted to languages in these settings to support information access on the Web. This study proposes mobile search interfaces to support isiXhosa speakers to search for information on the Web using isiXhosa as a discovery language. The isiXhosa language is considered a low-resourced African (Bantu) language spoken in resource-constrained environments in South Africa. The language is spoken by over eight million people. Yet, there has been no search interface specifically targeted at supporting isiXhosa speakers. Two mobile search interfaces were developed on an Android application. The interfaces were text based and voice based. The design of the interfaces was based on feedback from 4 native isiXhosa speakers in a design focus group, and guidelines from the literature. Using the developed interfaces, an experiment was conducted with 34 native isiXhosa speaking students at the University of Cape Town, South Africa. This was done to investigate, which interface could better support isiXhosa speakers to search for information on the Web using mobile phones. Quantitative data was collected using application log files. Additionally, user feedback was then obtained using the standard Software Usability Measurement Inventory (SUMI) instrument, and both interfaces were confirmed as usable. In contrast to what was expected, users preferred the text interface in general, and according to most SUMI subscales. This could be because of greater familiarity with text search interfaces or because of the relative scarcity of voice interfaces in African (Bantu) languages. Where users are not literate, the voice interface may be the only option, so the fact that it was deemed usable is an important independent finding. Search in African (Bantu) language collections is still a largely unexplored field, and more work needs to be done on the interfaces as the algorithms and collections are developed in parallel

Cape Town University OpenUCT

The EEE corpus: socio-affective "glue" cues in elderly-robot interactions in a Smart Home with the EmOz platform

Author: Aubergé Véronique
Batista Antunes Leandra
Bonnefond Nicolas
Caffiau Sybille
de Biasi Gilles
Meillon Brigitte
Nebout Florian
Rey-Gorrez Jonathan
Robert Tim
Sasa Yuko
Schwartz Adrien
Publication venue: HAL CCSD
Publication date: 27/05/2014
Field of study

International audienceThe aim of this preliminary study of feasibility is to give a glance at interactions in a Smart Home prototype between the elderly and a companion robot that is having some socio-affective language primitives as the only vector of communication. The paper particularly focuses on the methodology and the scenario made to collect a spontaneous corpus of human-robot interactions. Through a Wizard of Oz platform (EmOz), which was specifically developed for this issue, a robot is introduced as an intermediary between the technological environment and some elderly who have to give vocal commands to the robot to control the Smart Home. The robot vocal productions increases progressively by adding prosodic levels: (1) no speech, (2) pure prosodic mouth noises supposed to be the "glue's" tools, (3) lexicons with supposed "glue" prosody and (4) subject's commands imitations with supposed "glue" prosody. The elderly subjects' speech behaviours confirm the hypothesis that the socio-affective "glue" effect increase towards the prosodic levels, especially for socio-isolated people. The actual corpus is still on recording process and is motivated to collect data from socio-isolated elderly in real need

Hal - Université Grenoble Alpes

The EEE corpus: socio-affective "glue" cues in elderly-robot interactions in a Smart Home with the EmOz platform

Author: Aubergé Véronique
Batista Antunes Leandra
Bonnefond Nicolas
Caffiau Sybille
de Biasi Gilles
Meillon Brigitte
Nebout Florian
Rey-Gorrez Jonathan
Robert Tim
Sasa Yuko
Schwartz Adrien
Publication venue: HAL CCSD
Publication date: 27/05/2014
Field of study

Hal - Université Grenoble Alpes

Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline

Author: Grönroos Stig-Arne
Kurimo Mikko
Smit Peter
Virpioja Sami
Publication venue: Aalto-yliopisto
Publication date: 01/01/2013
Field of study

Morfessor is a family of probabilistic machine learning methods that find morphological segmentations for words of a natural language, based solely on raw text data. After the release of the public implementations of the Morfessor Baseline and Categories-MAP methods in 2005, they have become popular as automatic tools for processing morphologically complex languages for applications such as speech recognition and machine translation. This report describes a new implementation of the Morfessor Baseline method. The new version not only fixes the main restrictions of the previous software, but also includes recent methodological extensions such as semi-supervised learning, which can make use of small amounts of manually segmented words. Experimental results for the various features of the implementation are reported for English and Finnish segmentation tasks

Aaltodoc Publication Archive

Language variation, automatic speech recognition and algorithmic bias

Author: Markl Nina
Publication venue: The University of Edinburgh
Publication date: 12/12/2023
Field of study

In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology and language policy) and contemporary debates in AI ethics (especially regarding algorithmic bias and fairness). In recent years, automatic speech recognition systems, alongside other language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application domains and language varieties can be understood as an expansion into new sociolinguistic contexts. In this thesis, I am interested in how automatic speech recognition tools interact with this sociolinguistic context, and how they affect speakers, speech communities and their language varieties. Focussing on commercial automatic speech recognition systems for British Englishes, I first explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias within the wider sociolinguistic context, it becomes apparent that these systems reproduce and potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices, such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and the role of the developers making them better, I draw on theory from language policy research and critical data studies. These conceptual frameworks are intended to help practitioners and researchers in anticipating and mitigating predictive bias and other potential harms of speech technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These discourses put forward by researchers, developers and commercial providers not only have a direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g., existing beliefs about language(s)) affects technology development. Finally, I explore ways of building better automatic speech recognition tools, focussing in particular on well-documented, naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential harms of automatic speech recognition systems as embedded in larger algorithmic systems and sociolinguistic contexts

Edinburgh Research Archive

Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

Author: Schlippe Tim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

KITopen