5 research outputs found

    Translating Handwritten Bushman Texts

    Get PDF
    The Bleek and Lloyd Collection is a collection of artefacts documenting the life and language of the Bushman people of southern Africa in the 19th century. Included in this collection is a handwritten dictionary that contains English words and their corresponding |xam Bushman language translations. This dictionary allows for the manual translation of |xam words that appear in the notebooks of the Bleek and Lloyd collection. This, however, is not practical due to the size of the dictionary, which contains over 14000 entries. To solve this problem a content-based image retrieval system was built that allows for the selection of a |xam word from a notebook and returns matching words from the dictionary. The system shows promise with some search keys returning relevant results

    Creating a Handwriting Recognition Corpus for Bushman Languages

    Get PDF
    Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors

    An efficient management system for large digital object collections

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 87-91)Cultures evolve continuously, and it is therefore vital to track and record these changes, and most importantly of all, manage the resulting huge mass of data such as images, video clips, audio recordings and documents. This thesis examines the design of a Web-based solution, hereafter referred to as the Information Management System (IMS), to handle the efficient, accurate and secure management of a large number of objects

    Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Languages

    Get PDF
    The Bleek and Lloyd Collection contains notebooks that document the tradition, language and culture of the Bushman people who lived in South Africa in the late 19th century. Transcriptions of these notebooks would allow for the provision of services such as text-based search and text-to-speech. However, these notebooks are currently only available in the form of digital scans and the manual creation of transcriptions is a costly and time-consuming process. Thus, automatic methods could serve as an alternative approach to creating transcriptions of the text in the notebooks. In order to evaluate the use of automatic methods, a corpus of Bushman texts and their associated transcriptions was created. The creation of this corpus involved: the development of a custom method for encoding the Bushman script, which contains complex diacritics; the creation of a tool for creating and transcribing the texts in the notebooks; and the running of a series of workshops in which the tool was used to create the corpus. The corpus was used to evaluate the use of various techniques for automatically transcribing the texts in the corpus in order to determine which approaches were best suited to the complex Bushman script. These techniques included the use of Support Vector Machines, Artificial Neural Networks and Hidden Markov Models as machine learning algorithms, which were coupled with different descriptive features. The effect of the texts used for training the machine learning algorithms was also investigated as well as the use of a statistical language model. It was found that, for Bushman word recognition, the use of a Support Vector Machine with Histograms of Oriented Gradient features resulted in the best performance and, for Bushman text line recognition, Marti & Bunke features resulted in the best performance when used with Hidden Markov Models. The automatic transcription of the Bushman texts proved to be difficult and the performance of the different recognition systems was largely affected by the complexities of the Bushman script. It was also found that, besides having an influence on determining which techniques may be the most appropriate for automatic handwriting recognition, the texts used in a automatic handwriting recognition system also play a large role in determining whether or not automatic recognition should be attempted at all

    Translating handwritten bushman texts

    No full text
    corecore