44 research outputs found

    Feasibility of Automatic Transcription of Neatly Rewritten Bushman Texts

    Get PDF
    The purpose of this study is to investigate the feasibility of the research to be conducted for a MSc. The study is concerned with the automatic transcription of part of a handwritten |xam story, which contains a limited set of characters from the |xam Bushman language. The transcription is performed using a trained SVM model to classify the characters. The text to be transcribed is a neatly rewritten version of the first page of A Story of the Girl who made the Milky Way, which appears in one of Lucy Lloyd’s |xam notebooks. Two authors participated in this study with the purpose of evaluating the ability to transcribe the handwriting of multiple authors

    The BOLD Project: The BOLD Translator

    Get PDF
    The Lloyd and Bleek Collection contains over 14000 dictionary pages with both an English word and its Bushman language translation. The notebooks in the Lloyd and Bleek Collection contain Bushman stories where in many cases English translations do not exist or are not clear. It is natural to assume that people making use of the notebooks would like to make use of the dictionary to translate words which appear in the notebooks. This, however, is not practical simply due to the magnitude of the dictionary. A need therefore exists to build a tool for interaction between the dictionary pages and the notebooks to allow for translation. A content based image retrieval (CBIR) system was built to do this and it was shown that it is possible to find the corresponding words in the dictionary by providing a single word from the notebooks as a search key. The system shows promising potential with well selected search keys returning relevant results

    Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Languages

    Get PDF
    The Bleek and Lloyd Collection contains notebooks that document the tradition, language and culture of the Bushman people who lived in South Africa in the late 19th century. Transcriptions of these notebooks would allow for the provision of services such as text-based search and text-to-speech. However, these notebooks are currently only available in the form of digital scans and the manual creation of transcriptions is a costly and time-consuming process. Thus, automatic methods could serve as an alternative approach to creating transcriptions of the text in the notebooks. In order to evaluate the use of automatic methods, a corpus of Bushman texts and their associated transcriptions was created. The creation of this corpus involved: the development of a custom method for encoding the Bushman script, which contains complex diacritics; the creation of a tool for creating and transcribing the texts in the notebooks; and the running of a series of workshops in which the tool was used to create the corpus. The corpus was used to evaluate the use of various techniques for automatically transcribing the texts in the corpus in order to determine which approaches were best suited to the complex Bushman script. These techniques included the use of Support Vector Machines, Artificial Neural Networks and Hidden Markov Models as machine learning algorithms, which were coupled with different descriptive features. The effect of the texts used for training the machine learning algorithms was also investigated as well as the use of a statistical language model. It was found that, for Bushman word recognition, the use of a Support Vector Machine with Histograms of Oriented Gradient features resulted in the best performance and, for Bushman text line recognition, Marti & Bunke features resulted in the best performance when used with Hidden Markov Models. The automatic transcription of the Bushman texts proved to be difficult and the performance of the different recognition systems was largely affected by the complexities of the Bushman script. It was also found that, besides having an influence on determining which techniques may be the most appropriate for automatic handwriting recognition, the texts used in a automatic handwriting recognition system also play a large role in determining whether or not automatic recognition should be attempted at all

    Using A Hidden Markov Model to Transcribe Handwritten Bushman Texts

    Get PDF
    The Bushman texts in the Bleek and Lloyd Collection contain complex diacritics that make automatic transcription difficult. Transcriptions of these texts would allow for enhanced digital library services to be created for interacting with the collection. In this study, an investigation into automatic transcription of the Bushman texts was performed using the popular method of using a Hidden Markov Model for text line recognition. The results show that while this technique may be well suited to well-constrained and understood scripts, its application to more complex scripts introduces a number of difficulties that need to be overcome

    Creating a Handwriting Recognition Corpus for Bushman Languages

    Get PDF
    Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors

    Evaluating Large Image Support for DSpace

    Get PDF
    Access to large images in digital libraries is desirable from a preservation perspective and may even be a requirement in some domains, such as cartography. However, providing access to large images often poses a problem as a result of the size of the images as well as the limited screen real-estate for displaying the images. Even when these issues are addressed, there is a lack of evidence about how well large image related tasks can be performed in a digital library. In investigating this, a survey was conducted in order to identify well-performing large image support tools and the best of these tools was integrated into DSpace. A user study was conducted in order to evaluate how well large images could be supported in a digital library and it was found that users were able to successfully and easily perform tasks related to large images

    Learning to Read Bushman

    Get PDF
    The notebooks in the Bleek and Lloyd collection contain handwritten stories that metaphorically encode the Bushman culture and are useful to researchers and scholars trying to understand Bushman language and culture. These notebooks, however, only exist as scanned images and therefore the stories they contain cannot be searched, indexed or compared. This research seeks to investigate how accurately the Bushman stories can be automatically converted from images to text, in a process known as transcription, and also to explore the various techniques for doing this. The expected contribution is a measurement of how accurately transcription can be automatically performed as well as a comparison of different techniques for doing this

    Evaluating Simple Repository Deposit for Open Educational Resources

    Get PDF
    The rise of technologies and simpler software tools have been identified as drivers for the Open Educational Resources (OER) movement. However, content creators have been slow to adopt current OER solutions, as is shown by weak repository deposit rates and activities. To begin to address this, a desktop tool that simplifies the deposit process and that can be integrated into the content creation workflow was created. The goal of the tool was to support metadata that could accurately describe OERs and to ensure that, when deposited using the tool, OERs were represented correctly in the repository. Evaluation of the tool by users showed its potential in simplifying the repository deposit process, encouraging the creation of OERs and motivating content creators to share. Furthermore, users were also able to represent OERs using the adopted metadata standard and were satisfied with the way in which their OERs were represented in the repository

    Bonolo: A General Digital Library System for File-based Collections

    Get PDF
    There is an ever-increasing amount of digital content being generated that needs to be well-organised, preserved and made accessible. The majority of generic repository software tools that currently exist are, arguably, overly complex, thus making collections dicult to manage and maintain in resource constrained environments. A possible solution to this problem would, in part, require designing digital library tools and services that are simple and easy to manage. This paper describes a digital library system that is based on a set of design decisions aimed at simplifying repository software architectures. The proposed system makes use of a hierarchical file-store for storage of digital objects. Evaluation of the system by means of a user experience study was conducted to investigate the usefulness of the system, its relative ease of use and what effect, if any, the architecture would have on the user experience. Experimental results showed that users found the system useful, effective and easy to use and that the architecture did not appear to negatively in uence the user experience

    An Online Meeting Tool for Low Bandwidth Environments

    Get PDF
    Online meetings allow for remote conferencing and collaborative work among geographically dispersed participants and can save time and expenses that an ordinary face-to-face meeting would require. However, carrying real-time communication within the packet-switched Internet is a challenging task, especially in an African context, which is characterized by low bandwidth and unstable Internet connections. This paper presents and evaluates a tool that was designed to enhance the user experience for Web-based conferencing, given the constraints of Internet conditions typical of Africa. Approaches used to achieve this goal included: reprioritisation of multimedia streams, image differentiation, half duplex communication mode and stream compression. It was found that less than 56 kbps of bandwidth was required in order to: transmit audio; use video to convey presence; share slides and screen; and support text-based chat and floor control. Furthermore, users were largely satisfied with the tool and felt that it created a good user experience
    corecore