17 research outputs found

    A Method for Proper Noun Extraction in Kurdish

    Get PDF
    This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application based on an architecture which includes a number of name lists, a set of rules, and a set of processes that recognizes Kurdish person names. This can help the study of Information Retrieval (IR) in Kurdish to advance and can also be used in Kurdish machine translation. We conducted several experiments which showed that the precision of the method is more than 95%, the recall is between 40% to 80%, and the F-measure is close to 60% to more than 80%. The reason for the low recall precision was because our name lists were not exhaustive enough to cover the vast majority of the Kurdish names

    A Neural Approach to Language Variety Translation

    Full text link
    In this paper we present the first neural-based machine translation system trained to translate between standard national varieties of the same language. We take the pair Brazilian - European Portuguese as an example and compare the performance of this method to a phrase-based statistical machine translation system. We report a performance improvement of 0.9 BLEU points in translating from European to Brazilian Portuguese and 0.2 BLEU points when translating in the opposite direction. We also carried out a human evaluation experiment with native speakers of Brazilian Portuguese which indicates that humans prefer the output produced by the neural-based system in comparison to the statistical system.Comment: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial

    Kurdish Optical Character Recognition

    Get PDF
    Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe

    When Children Chat with Machine Translated Text: Problems, Possibilities, Potential

    Get PDF
    Two cross-lingual (Nepalese and English) letter exchanges took place between school children from Nepal and England, using Digipal; an Android chatting application. Digipal uses Google Translate to enable children to read and reply in their native language. In two studies we analysed the errors made and the effect of errors on children’s understanding and on the flow of conversation. We found that errors of input negatively affected translation, although this can be reduced through initial grammar cleaning. We highlight features of children’s text that cause errors in translation whilst showing how children worked with and around these errors. Errors sometimes added humour and contributed to continuing the conversations

    Findings of the 2019 Conference on Machine Translation (WMT19)

    Get PDF
    This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

    Arabic and contact-induced change

    Get PDF
    This volume offers a synthesis of current expertise on contact-induced change in Arabic and its neighbours, with thirty chapters written by many of the leading experts on this topic. Its purpose is to showcase the current state of knowledge regarding the diverse outcomes of contacts between Arabic and other languages, in a format that is both accessible and useful to Arabists, historical linguists, and students of language contact

    Arabic and contact-induced change

    Get PDF
    This volume offers a synthesis of current expertise on contact-induced change in Arabic and its neighbours, with thirty chapters written by many of the leading experts on this topic. Its purpose is to showcase the current state of knowledge regarding the diverse outcomes of contacts between Arabic and other languages, in a format that is both accessible and useful to Arabists, historical linguists, and students of language contact
    corecore