Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations

Abidi, K; Fohr, Dominique; González-Gallardo, C,; Jouvet, Denis; Langlois, D; Mella, Odile; Menacer, M,; Sadat, F; Smaïli, Kamel; Torres-Moreno, J,

Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations

Authors: K Abidi
Dominique Fohr
C, González-Gallardo
Denis Jouvet
D Langlois
Odile Mella
M, Menacer
F Sadat
Kamel Smaïli
J, Torres-Moreno
Publication date: 16 October 2019
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

International audienceIn this paper, we present and evaluate a method for extractive text-based summarization of Arabic videos. The algorithm is proposed in the scope of the AMIS project that aims at helping a user to understand videos given in a foreign language (Arabic). For that, the project proposes several strategies to translate and summarize the videos. One of them consists in transcribing the Ara-bic videos, summarizing the transcriptions, and translating the summary. In this paper we describe the video corpus that was collected from YouTube and present and evaluate the transcription-summarization part of this strategy. Moreover, we present the Automatic Speech Recognition (ASR) system used to transcribe the videos, and show how we adapted this system to the Algerian dialect. Then, we describe how we automatically segment into sentences the sequence of words provided by the ASR system, and how we summarize the obtained sequence of sentences. We evaluate objectively and subjectively our approach. Results show that the ASR system performs well in terms of Word Error Rate on MSA, but needs to be adapted for dealing with Algerian dialect data. The subjective evaluation shows the same behaviour than ASR: transcriptions for videos containing dialectal data were better scored than videos containing only MSA data. However, summaries based on transcriptions are not as well rated, even when transcriptions are better rated. Last, the study shows that features, such as the lengths of transcriptions and summaries, and the subjective score of transcriptions, explain only 31% of the subjective score of summaries

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archive Ouverte en Sciences de l'Information et de la Communication

oai:HAL:hal-02314238v1

Last time updated on 21/10/2019

INRIA a CCSD electronic archive server

oai:HAL:hal-02314238v1

Last time updated on 29/10/2019

Crossref

Last time updated on 10/08/2021