1,437 research outputs found

    Noisy-parallel and comparable corpora filtering methodology for the extraction of bi-lingual equivalent data at sentence level

    Get PDF
    Text alignment and text quality are critical to the accuracy of Machine Translation (MT) systems, some NLP tools, and any other text processing tasks requiring bilingual data. This research proposes a language independent bi-sentence filtering approach based on Polish (not a position-sensitive language) to English experiments. This cleaning approach was developed on the TED Talks corpus and also initially tested on the Wikipedia comparable corpus, but it can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence comparison. Some of them leverage synonyms and semantic and structural analysis of text as additional information. Minimization of data loss was ensured. An improvement in MT system score with text processed using the tool is discussed.Comment: arXiv admin note: text overlap with arXiv:1509.09093, arXiv:1509.0888

    Enhancing Bi-directional English-Tigrigna Machine Translation Using Hybrid Approach

    Get PDF
    Machine Translation (MT) is an application area of NLP where automatic systems are used to translate text or speech from one language to another while preserving the meaning of the source language. Although there exists a large volume of literature in automatic machine translation of documents in many languages, the translation between English and Tigrigna is less explored. Therefore, we proposed the hybrid approach to address the challenges of applying syntactic reordering rules which align and capture the structural arrangement of words in the source sentence to become more like the target sentences. Two language models were developed- one for English and another for Tigrigna and about 12,000 parallel sentences in four domains and 32,000 bilingual dictionaries were collected for our experiment. The parallel collected corpus was split randomly to 10,800 sentences for training set and 1,200 sentences for testing. Moses open source statistical machine translation system has been used for the experiment to train, tune and decode. The parallel corpus was aligned using the Giza++ toolkit and SRILM was used for building the language model. Three main experiments were conducted using statistical approach, hybrid approach and post-processing technique. According to our experimental result showed good translation output as high as 32.64 BLEU points Google translator and the hybrid approach was found most promising for English-Tigrigna bi-directional translation

    Accessible options for deaf people in e-Learning platforms: technology solutions for sign language translation

    Get PDF
    AbstractThis paper presents a study on potential technology solutions for enhancing the communication process for deaf people on e-learning platforms through translation of Sign Language (SL). Considering SL in its global scope as a spatial-visual language not limited to gestures or hand/forearm movement, but also to other non-dexterity markers such as facial expressions, it is necessary to ascertain whether the existing technology solutions can be effective options for the SL integration on e-learning platforms. Thus, we aim to present a list of potential technology options for the recognition, translation and presentation of SL (and potential problems) through the analysis of assistive technologies, methods and techniques, and ultimately to contribute for the development of the state of the art and ensure digital inclusion of the deaf people in e-learning platforms. The analysis show that some interesting technology solutions are under research and development to be available for digital platforms in general, but yet some critical challenges must solved and an effective integration of these technologies in e-learning platforms in particular is still missing
    corecore