3 research outputs found

    Quality Assessment in Crowdsourced Indigenous Language Transcription

    Get PDF
    The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. In this paper, a crowdsourcing method is proposed to transcribe the manuscripts, where non-expert volunteers transcribe pages of the notebooks using an online tool. Experiments were conducted to determine the quality and consistency of transcriptions. The results show that volunteeers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers

    Xamobile: Usability Evaluation of Text Input Methods on Mobile Devices for Historical African Languages

    Get PDF
    Customized text input editors on mobile devices for languages with no standard language models, such as some African languages, are vital to allow text input tasks to be crowdsourced and thus enable quick and precise participation. We investigated 4 different mobile input techniques for complex language scripts like |Xam and collected accuracy data from experiments with the Xwerty, T9, Pinyin script and hierarchical entry methods for mobile devices and also usability data from the participants. Our results on usability testing show that Xwerty methods offer substantial benefits to the majority of users in terms of speed for |Xam text entry and ease of use
    corecore