Search CORE

131 research outputs found

Amharic Speech Recognition for Speech Translation

Author: Besacier Laurent
Melese Michael
Meshesha Million
Publication venue: HAL CCSD
Publication date: 01/07/2016
Field of study

International audienceThe state-of-the-art speech translation can be seen as a cascade of Automatic Speech Recognition, Statistical Machine Translation and Text-To-Speech synthesis. In this study an attempt is made to experiment on Amharic speech recognition for Amharic-English speech translation in tourism domain. Since there is no Amharic speech corpus, we developed a read-speech corpus of 7.43hr in tourism domain. The Amharic speech corpus has been recorded after translating standard Basic Traveler Expression Corpus (BTEC) under a normal working environment. In our ASR experiments phoneme and syllable units are used for acoustic models, while morpheme and word are used for language models. Encouraging ASR results are achieved using morpheme-based language models and phoneme-based acoustic models with a recognition accuracy result of 89.1%, 80.9%, 80.6%, and 49.3% at character, morph, word and sentence level respectively. We are now working towards designing Amharic-English speech translation through cascading components under different error correction algorithms

Hal - Université Grenoble Alpes

The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation

Author: Ayele Abinew Ali
Belay Tadesse Destaw
Gelbukh Alexander
Haile Silesh Bogale
Kolesnikova Olga
Sidorov Grigori
Tonja Atnafu Lambebo
Yimam Seid Muhie
Publication venue
Publication date: 27/10/2022
Field of study

Machine translation (MT) is one of the main tasks in natural language processing whose objective is to translate texts automatically from one natural language to another. Nowadays, using deep neural networks for MT tasks has received great attention. These networks require lots of data to learn abstract representations of the input and store it in continuous vectors. This paper presents the first relatively large-scale Amharic-English parallel sentence dataset. Using these compiled data, we build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model achieving a BLEU score of 37.79 in Amharic-English 32.74 in English-Amharic translation. Additionally, we explore the effects of Amharic homophone normalization on the machine translation task. The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions

arXiv.org e-Print Archive

Extended Parallel Corpus for Amharic-English Machine Translation

Author: Bati Tesfaye Bayu
Gezmu Andargachew Mekonnen
Nürnberger Andreas
Publication venue
Publication date: 25/06/2021
Field of study

This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be useful for machine translation of an under-resourced language, Amharic. The corpus is larger than previously compiled corpora; it is released for research purposes. We trained neural machine translation and phrase-based statistical machine translation models using the corpus. In the automatic evaluation, neural machine translation models outperform phrase-based statistical machine translation models.Comment: Accepted to 2nd AfricanNLP workshop at EACL 202

arXiv.org e-Print Archive

Classifying Amharic News Text Using Self-Organizing Maps

Author: Eyassu Samuel
Gambäck Björn
Publication venue
Publication date: 01/01/2005
Field of study

The paper addresses using artificial neural networks for classification of Amharic news items. Amharic is the language for countrywide communication in Ethiopia and has its own writing system containing extensive systematic redundancy. It is quite dialectally diversified and probably representative of the languages of a continent that so far has received little attention within the language processing field. The experiments investigated document clustering around user queries using Self-Organizing Maps, an unsupervised learning neural network strategy. The best ANN model showed a precision of 60.0% when trying to cluster unseen data, and a 69.5% precision when trying to classify it

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages

Author: De Pauw Guy
de Schryver Gilles-Maurice
Levin Lori
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Ghent University Academic Bibliography

AfLaT 2010: proceedings of the second workshop on African language technology (AfLaT 2010)

Author: De Pauw Guy
de Schryver Gilles-Maurice
Groenewald Handré
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

The Development of Oromo Writing System

Author: Degeneh Bijiga Teferi
Publication venue
Publication date: 01/11/2015
Field of study

The development and use of languages for official, education, religion, etc. purposes have been a major political issue in many developing multilingual countries. A number of these countries, including China and India, have recognised the issues and developed language policies that have provided some ethnic groups with the right to develop their languages and cultures by using writing systems based on scripts suitable for these purposes. On the other hand, other countries, such as Ethiopia (a multilingual African state) had, for a long time, preferred a policy of one language and one script in the belief that this would help the assimilation of various ethnic groups create a homogenous population with one language and culture. Rather than realizing that aim, the policy became a significant source of conflict and demands for political independence among disfavoured groups. This thesis addresses the development of a writing system for Oromo, a language spoken by approximately 40 percent of the total population of Ethiopia, which remained officially unwritten until the early 1990s. It begins by reviewing the early history of Oromo writing and discusses the Ethiopian language policies, analysing materials written in various scripts and certain writers starting from the 19th century. The adoption of Roman script for Oromo writing and the debates that followed are explored, with an examination of some phonological aspects of the Oromo language and the implications of representing them using the Roman alphabet. This thesis argues that the Oromo language has thrived during the past few years having implemented a Roman-based alphabetical script. There have been and continue to be, however, internal and external challenges confronting the development of the Oromo writing system which need to be carefully considered and addressed by stakeholders, primarily by the Oromo people and the Ethiopian government, in order for the Oromo language to establish itself as a fully codified language in the modern nation-state

Kent Academic Repository