15 research outputs found

    Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations

    Get PDF
    International audienceIn this paper, we present and evaluate a method for extractive text-based summarization of Arabic videos. The algorithm is proposed in the scope of the AMIS project that aims at helping a user to understand videos given in a foreign language (Arabic). For that, the project proposes several strategies to translate and summarize the videos. One of them consists in transcribing the Ara-bic videos, summarizing the transcriptions, and translating the summary. In this paper we describe the video corpus that was collected from YouTube and present and evaluate the transcription-summarization part of this strategy. Moreover, we present the Automatic Speech Recognition (ASR) system used to transcribe the videos, and show how we adapted this system to the Algerian dialect. Then, we describe how we automatically segment into sentences the sequence of words provided by the ASR system, and how we summarize the obtained sequence of sentences. We evaluate objectively and subjectively our approach. Results show that the ASR system performs well in terms of Word Error Rate on MSA, but needs to be adapted for dealing with Algerian dialect data. The subjective evaluation shows the same behaviour than ASR: transcriptions for videos containing dialectal data were better scored than videos containing only MSA data. However, summaries based on transcriptions are not as well rated, even when transcriptions are better rated. Last, the study shows that features, such as the lengths of transcriptions and summaries, and the subjective score of transcriptions, explain only 31% of the subjective score of summaries

    A comparison of mobile search interfaces for isiXhosa speakers

    Get PDF
    Search interfaces have for a long time been targeted at the resource-rich languages, such as English. There has been little effort to support African (Bantu) languages in search interfaces when compared to languages such as English, particularly the isiXhosa language. However, due to the increase in use of mobile phones in developing countries, these interfaces can now be adapted to languages in these settings to support information access on the Web. This study proposes mobile search interfaces to support isiXhosa speakers to search for information on the Web using isiXhosa as a discovery language. The isiXhosa language is considered a low-resourced African (Bantu) language spoken in resource-constrained environments in South Africa. The language is spoken by over eight million people. Yet, there has been no search interface specifically targeted at supporting isiXhosa speakers. Two mobile search interfaces were developed on an Android application. The interfaces were text based and voice based. The design of the interfaces was based on feedback from 4 native isiXhosa speakers in a design focus group, and guidelines from the literature. Using the developed interfaces, an experiment was conducted with 34 native isiXhosa speaking students at the University of Cape Town, South Africa. This was done to investigate, which interface could better support isiXhosa speakers to search for information on the Web using mobile phones. Quantitative data was collected using application log files. Additionally, user feedback was then obtained using the standard Software Usability Measurement Inventory (SUMI) instrument, and both interfaces were confirmed as usable. In contrast to what was expected, users preferred the text interface in general, and according to most SUMI subscales. This could be because of greater familiarity with text search interfaces or because of the relative scarcity of voice interfaces in African (Bantu) languages. Where users are not literate, the voice interface may be the only option, so the fact that it was deemed usable is an important independent finding. Search in African (Bantu) language collections is still a largely unexplored field, and more work needs to be done on the interfaces as the algorithms and collections are developed in parallel

    The EEE corpus: socio-affective "glue" cues in elderly-robot interactions in a Smart Home with the EmOz platform

    No full text
    International audienceThe aim of this preliminary study of feasibility is to give a glance at interactions in a Smart Home prototype between the elderly and a companion robot that is having some socio-affective language primitives as the only vector of communication. The paper particularly focuses on the methodology and the scenario made to collect a spontaneous corpus of human-robot interactions. Through a Wizard of Oz platform (EmOz), which was specifically developed for this issue, a robot is introduced as an intermediary between the technological environment and some elderly who have to give vocal commands to the robot to control the Smart Home. The robot vocal productions increases progressively by adding prosodic levels: (1) no speech, (2) pure prosodic mouth noises supposed to be the "glue's" tools, (3) lexicons with supposed "glue" prosody and (4) subject's commands imitations with supposed "glue" prosody. The elderly subjects' speech behaviours confirm the hypothesis that the socio-affective "glue" effect increase towards the prosodic levels, especially for socio-isolated people. The actual corpus is still on recording process and is motivated to collect data from socio-isolated elderly in real need

    The EEE corpus: socio-affective "glue" cues in elderly-robot interactions in a Smart Home with the EmOz platform

    No full text
    International audienceThe aim of this preliminary study of feasibility is to give a glance at interactions in a Smart Home prototype between the elderly and a companion robot that is having some socio-affective language primitives as the only vector of communication. The paper particularly focuses on the methodology and the scenario made to collect a spontaneous corpus of human-robot interactions. Through a Wizard of Oz platform (EmOz), which was specifically developed for this issue, a robot is introduced as an intermediary between the technological environment and some elderly who have to give vocal commands to the robot to control the Smart Home. The robot vocal productions increases progressively by adding prosodic levels: (1) no speech, (2) pure prosodic mouth noises supposed to be the "glue's" tools, (3) lexicons with supposed "glue" prosody and (4) subject's commands imitations with supposed "glue" prosody. The elderly subjects' speech behaviours confirm the hypothesis that the socio-affective "glue" effect increase towards the prosodic levels, especially for socio-isolated people. The actual corpus is still on recording process and is motivated to collect data from socio-isolated elderly in real need

    Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline

    Get PDF
    Morfessor is a family of probabilistic machine learning methods that find morphological segmentations for words of a natural language, based solely on raw text data. After the release of the public implementations of the Morfessor Baseline and Categories-MAP methods in 2005, they have become popular as automatic tools for processing morphologically complex languages for applications such as speech recognition and machine translation. This report describes a new implementation of the Morfessor Baseline method. The new version not only fixes the main restrictions of the previous software, but also includes recent methodological extensions such as semi-supervised learning, which can make use of small amounts of manually segmented words. Experimental results for the various features of the implementation are reported for English and Finnish segmentation tasks

    Language variation, automatic speech recognition and algorithmic bias

    Get PDF
    In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology and language policy) and contemporary debates in AI ethics (especially regarding algorithmic bias and fairness). In recent years, automatic speech recognition systems, alongside other language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application domains and language varieties can be understood as an expansion into new sociolinguistic contexts. In this thesis, I am interested in how automatic speech recognition tools interact with this sociolinguistic context, and how they affect speakers, speech communities and their language varieties. Focussing on commercial automatic speech recognition systems for British Englishes, I first explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias within the wider sociolinguistic context, it becomes apparent that these systems reproduce and potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices, such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and the role of the developers making them better, I draw on theory from language policy research and critical data studies. These conceptual frameworks are intended to help practitioners and researchers in anticipating and mitigating predictive bias and other potential harms of speech technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These discourses put forward by researchers, developers and commercial providers not only have a direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g., existing beliefs about language(s)) affects technology development. Finally, I explore ways of building better automatic speech recognition tools, focussing in particular on well-documented, naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential harms of automatic speech recognition systems as embedded in larger algorithmic systems and sociolinguistic contexts

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists
    corecore