431 research outputs found

    A System for High Quality Crowdsourced Indigenous Language Transcription

    Get PDF
    In this article, a crowdsourcing method is proposed to transcribe manuscripts from the Bleek and Lloyd Collection, where non-expert volunteers transcribe pages of the handwritten text using an online tool. The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. This article presents details of the system used to enable transcription by volunteers as well as results from experiments that were conducted to determine the quality and consistency of transcriptions. The results show that volunteeers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers

    Quality Assessment in Crowdsourced Indigenous Language Transcription

    Get PDF
    The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. In this paper, a crowdsourcing method is proposed to transcribe the manuscripts, where non-expert volunteers transcribe pages of the notebooks using an online tool. Experiments were conducted to determine the quality and consistency of transcriptions. The results show that volunteeers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers

    Citizen Science in Archaeology

    Get PDF
    Citizen science, as a process of volunteer participation through crowdsourcing, facilitates the creation of mass data sets needed to address subtle and large-scale patterns in complex phenomena. Citizen science efforts in other field disciplines such as biology, geography, and astronomy indicate how new web-based interfaces can enhance and expand upon archaeologists' existing platforms of volunteer engagement such as field schools, community archaeology, site stewardship, and professional-avocational partnerships. Archaeological research can benefit from the citizen science paradigm in four ways: fieldwork that makes use of widely available technologies such as mobile applications for photography and data upload; searches of large satellite image collections for site identification and monitoring; crowdfunding; and crowdsourced computer entry of heritage data

    Reflections on language documentation in India

    Get PDF
    The last twenty years have seen efforts to support the study of minority and lesser-studied languages of India from varied stakeholders: these include the Indian government, international and Indian nonprofit organizations, indigenous and state-level cultural and language committees and institutes, and individuals with a passion to preserve and document their cultures and languages. Their efforts have led to mixed success due to conflicting ideologies, history, and resource availability (Annamalai 2003). Basing my observations on my research, personal experience and engagement with language documentation activities in the country, I provide an overview of the current state of language study and my hopes and efforts for future of language documentation and description in India.National Foreign Language Resource Cente

    Sourcing Success: Assessment Techniques of Digital Cultural Heritage Crowdsourcing Projects

    Get PDF
    This study focuses on how libraries, archives, museums, and other cultural heritageinstitutions define and assess the success of online crowdsourcing projects. The researchwas conducted via a survey of twenty-two digital crowdsourcing projects ranging fromtranscription of digitized archival materials to wildlife documentation projects.The survey found that institutions had diverse reasons for undertaking crowdsourcingprojects and monitored project success through multiple assessment measures dependenton project goals. Survey respondents reported greater satisfaction with their projectoutcomes when they had identified at least one measurable goal prior to starting theproject. In general, survey respondents reported positive feelings about, and an interest infuture crowdsourcing projects as tools for description, community engagement, and userrecruitment.Master of Science in Library Scienc

    Capitalizing on Collections and the Crowd: What needs to be considered before embarking on a digital crowdsourcing project?

    Get PDF
    This study provides insights into the decision-making process archivists and librarians undergo when considering incorporating crowdsourcing elements into a digital project. The research was conducted via semi-structured interviews with eight American archivists and librarians that have conducted a crowdsourcing project. The study found that these professionals chose crowdsourcing as a tool for a variety of reasons and, while they discussed many challenges associated with crowdsourcing, they believed their projects a success. The study establishes four suggestions that will help library and archives managers be better prepared to manage and access the viability of potentially incorporating crowdsourcing elements into their digital projects.Master of Science in Library Scienc
    corecore