5,268 research outputs found

    Machine transliteration of proper names between English and Persian

    Get PDF
    Machine transliteration is the process of automatically transforming a word from a source language to a target language while preserving pronunciation. The transliterated words in the target language are called out-of-dictionary, or sometimes out-of-vocabulary, meaning that they have been borrowed from other languages with a change of script. When a whole text is being translated, for example, then proper nouns and technical terms are subject to transliteration. Machine translation, and other applications which make use of this technology, such as cross-lingual information retrieval and cross-language question answering, deal with the problem of transliteration. Since proper nouns and technical terms - which need phonetical translation - are part of most text documents, transliteration is an important problem to study. We explore the problem of English to Persian and Persian to English transliteration using methods that work based on the grapheme of the source word. One major problem in handling Persian text is its lack of written short vowels. When transliterating Persian words to English, we need to develop a method of inserting vowels to make them pronounceable. Many different approaches using n-grams are explored and compared in this thesis, and we propose language-specific transliteration methods that improved transliteration accuracy. Our novel approaches use consonant-vowel sequences, and show significant improvements over baseline systems. We also develop a new alignment algorithm, and examine novel techniques to combine systems; approaches which improve the effectiveness of the systems. We also investigate the properties of bilingual corpora that affect transliteration accuracy. Our experiments suggest that the origin of the source words has a strong effect on the performance of transliteration systems. From the careful analysis of the corpus construction process, we conclude that at least five human transliterators are needed to construct a representative bilingual corpus that is used for the training and testing of transliteration systems

    Input Scheme for Hindi Using Phonetic Mapping

    Get PDF
    Written Communication on Computers requires knowledge of writing text for the desired language using Computer. Mostly people do not use any other language besides English. This creates a barrier. To resolve this issue we have developed a scheme to input text in Hindi using phonetic mapping scheme. Using this scheme we generate intermediate code strings and match them with pronunciations of input text. Our system show significant success over other input systems available

    Moses-based official baseline for NEWS 2016

    Get PDF
    Transliteration is the phonetic translation between two different languages. There are many works that approach transliteration using machine translation methods. This paper describes the official baseline system for the NEWS 2016 workshop shared task. This baseline is based on a standard phrase-based machine translation system using Moses. Results are between the range of best and worst from last year’s workshops providing a nice starting point for participants this year.Postprint (published version

    Examining the Problems and Inconsistencies in the interpolation of English Transliterated names of Persian Language Researchers in Citation Databases

    Get PDF
    English Transliterated names of Non-Roman language researchers have been indexed in citation databases in various ways and do not follow a specific rule. For this reason, all the works of a specific writer are not retrieved while searching. This problem is also evident in the transliterating the names of Persian language researchers widely. This study has examined the problems and inconsistencies in the interpolation of English Transliterated names of 1301 faculty members of SBMU [1] were indexed in Scopus and ISC[2] citation databases The results showed that 193 (15%) faculty members have not had indexed scientific production in both databases and 1108 (85%) people have been indexed in one of two databases of their papers. 357(32.2%) have registered their names in more than 2 forms, and 413(37.3%) in 2 forms, and only 338(30.5%) of faculty members have registered their names in one form. Therefore, almost 70% of faculty members have not registered their names in a single form. The compilation of a list of names document based on the frequency of written form in valid databases is a solution that has been proposed to resolve this problem. [1] Shahid Beheshti University of Medical Sciences [2] Islamic world science citation cente
    • …
    corecore