399 research outputs found
Quality Assessment in Crowdsourced Indigenous Language Transcription
The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. In this paper, a crowdsourcing method is proposed to transcribe the manuscripts, where non-expert volunteers transcribe pages of the notebooks using an online tool. Experiments were conducted to determine the quality and consistency of transcriptions. The results show that volunteeers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers
A System for High Quality Crowdsourced Indigenous Language Transcription
In this article, a crowdsourcing method is proposed to transcribe manuscripts from the Bleek and Lloyd Collection, where non-expert volunteers transcribe pages of the handwritten text using an online tool. The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. This article presents details of the system used to enable transcription by volunteers as well as results from experiments that were conducted to determine the quality and consistency of transcriptions. The results show that volunteeers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers
Citizen Science in Archaeology
Citizen science, as a process of volunteer participation through crowdsourcing, facilitates the creation of mass data sets needed to address subtle and large-scale patterns in complex phenomena. Citizen science efforts in other field disciplines such as biology, geography, and astronomy indicate how new web-based interfaces can enhance and expand upon archaeologists' existing platforms of volunteer engagement such as field schools, community archaeology, site stewardship, and professional-avocational partnerships. Archaeological research can benefit from the citizen science paradigm in four ways: fieldwork that makes use of widely available technologies such as mobile applications for photography and data upload; searches of large satellite image collections for site identification and monitoring; crowdfunding; and crowdsourced computer entry of heritage data
Sourcing Success: Assessment Techniques of Digital Cultural Heritage Crowdsourcing Projects
This study focuses on how libraries, archives, museums, and other cultural heritageinstitutions define and assess the success of online crowdsourcing projects. The researchwas conducted via a survey of twenty-two digital crowdsourcing projects ranging fromtranscription of digitized archival materials to wildlife documentation projects.The survey found that institutions had diverse reasons for undertaking crowdsourcingprojects and monitored project success through multiple assessment measures dependenton project goals. Survey respondents reported greater satisfaction with their projectoutcomes when they had identified at least one measurable goal prior to starting theproject. In general, survey respondents reported positive feelings about, and an interest infuture crowdsourcing projects as tools for description, community engagement, and userrecruitment.Master of Science in Library Scienc
Xamobile: Usability Evaluation of Text Input Methods on Mobile Devices for Historical African Languages
Customized text input editors on mobile devices for languages with no standard language models, such as some African languages, are vital to allow text input tasks to be crowdsourced and thus enable quick and precise participation. We investigated 4 different mobile input techniques for complex language scripts like |Xam and collected accuracy data from experiments with the Xwerty, T9, Pinyin script and hierarchical entry methods for mobile devices and also usability data from the participants. Our results on usability testing show that Xwerty methods offer substantial benefits to the majority of users in terms of speed for |Xam text entry and ease of use
Geoinformatics in Citizen Science
The book features contributions that report original research in the theoretical, technological, and social aspects of geoinformation methods, as applied to supporting citizen science. Specifically, the book focuses on the technological aspects of the field and their application toward the recruitment of volunteers and the collection, management, and analysis of geotagged information to support volunteer involvement in scientific projects. Internationally renowned research groups share research in three areas: First, the key methods of geoinformatics within citizen science initiatives to support scientists in discovering new knowledge in specific application domains or in performing relevant activities, such as reliable geodata filtering, management, analysis, synthesis, sharing, and visualization; second, the critical aspects of citizen science initiatives that call for emerging or novel approaches of geoinformatics to acquire and handle geoinformation; and third, novel geoinformatics research that could serve in support of citizen science
Citizen Science: Reducing Risk and Building Resilience to Natural Hazards
Natural hazards are becoming increasingly frequent within the context of climate change—making reducing risk and building resilience against these hazards more crucial than ever. An emerging shift has been noted from broad-scale, top-down risk and resilience assessments toward more participatory, community-based, bottom-up approaches. Arguably, non-scientist local stakeholders have always played an important role in risk knowledge management and resilience building. Rapidly developing information and communication technologies such as the Internet, smartphones, and social media have already demonstrated their sizeable potential to make knowledge creation more multidirectional, decentralized, diverse, and inclusive (Paul et al., 2018). Combined with technologies for robust and low-cost sensor networks, various citizen science approaches have emerged recently (e.g., Haklay, 2012; Paul et al., 2018) as a promising direction in the provision of extensive, real-time information for risk management (as well as improving data provision in data-scarce regions). It can serve as a means of educating and empowering communities and stakeholders that are bypassed by more traditional knowledge generation processes.
This Research Topic compiles 13 contributions that interrogate the manifold ways in which citizen science has been interpreted to reduce risk against hazards that are (i) water-related (i.e., floods, hurricanes, drought, landslides); (ii) deep-earth-related (i.e., earthquakes and volcanoes); and (iii) responding to global environmental change such as sea-level rise. We have sought to analyse the particular failures and successes of natural hazards-related citizen science projects: the objective is to obtain a clearer understanding of “best practice” in a citizen science context
Subtitling for a global audience. Handling the translation of culture-specific items in TEDx talks
[EN] TED.com is a platform to share ideas through influential talks in video format on topics that range from science and technology to business that engages volunteers from all over the world to help transcribe, subtitle and translate their scripts in more than 100 languages. The justification to engage volunteer transcribers is that transcribed talks can reach a wider audience because they are accessible for hearing impaired individuals, can be indexed in search engines and can achieve TED¿s mission of spreading ideas by making transcripts available for translation through TED¿s Open Translation Project.
Therefore, talks transcribers play a crucial role in the overall translation workflow and dissemination process as they are responsible for transcribing the contents and foundations of what will be later on translated into different languages. The objective of this paper is to analyse a corpus of talks originally delivered in different variants of Spanish to identify the most common strategies used by volunteer transcribers to handle local or idiomatic expressions and culturally biased items to reach the maximum audience possible and facilitate translation.Candel-Mora, MÁ.; González Pastor, DM. (2017). Subtitling for a global audience. Handling the translation of culture-specific items in TEDx talks. FORUM. Revue internationale d interprétation et de traduction. International Journal of Interpretation and Translation. 15(2):288-304. doi:10.1075/forum.15.2.07canS28830415
Recommended from our members
Revisiting Linking Early Geospatial Documents with Recogito
Recogito is a web-based environment for collaborative semantic annotation. It is open source software, and provides support for working with either text or image documents, including those served via the IIIF protocol. Originally, the tool has been designed for geographic annotation, i.e. the transcription, marking up and geo-resolving of maps and geographical texts (such as itineraries and travel reports) in the context of historical scholarship, e.g. to map or extract data from a source, or to prepare a digital edition. Over time, however, Recogito’s feature set has grown to provide more general annotation functionality, broadening the scope for further potential application areas. Following up from an earlier article we published in e-Perimetron in 2015, in which we first introduced Recogito, this article looks back on the past four years of use and development. We present how Recogito has technologically evolved; how it has been applied in practice in different projects and for different purposes; and how a vibrant user community has sprung up around it that is shaping its further development. The paper also looks forward to some planned next steps, and sets out our future vision for Recogito’s long-term development and sustainability
- …