Search CORE

49 research outputs found

Biodiversity informatics - entomological data processing, analysis and visualization

Author: Pontes Leonor Fernanda Venceslau Azeredo
Publication venue
Publication date: 01/01/2019
Field of study

Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019Este trabalho foca-se na digitalização, tratamento e análise de dados de colecções de história natural fazendo uso de ferramentas da informática da biodiversidade. Foram usados dados das colecções de insectos do Museu Nacional de História Natural e da Ciência (MNHNC) e do Instituto de Investigação Científica Tropical (IICT), Universidade de Lisboa. Em 2014, um dataset com 30 535 registos da colecção de insectos do MNHNC foi publicado no Global Biodiversity Information Facility (GBIF). Desde então, novos registos foram digitalizados e foram adicionados novos dados, tais como novas identificações taxonómicas, entre outros. Actualmente, o catálogo da colecção de insectos do MNHNC inclui 39 139 registos validados, que correspondem a cerca de 98% do total, referentes a 79 885 espécimes. Para que este dataset actualizado pudesse ser publicado, foram aplicadas ferramentas de limpeza de dados para detecção e correcção de erros, bem como a georreferenciação de registos, de forma a que os dados possam ser localizados num mapa a partir das coordenadas. Relativamente à limpeza e homogeneização de dados, todos os campos foram limpos e formatados de acordo com as normas do modelo de metadados DarwinCore. Este processo incluiu a verificação de identificações taxonómicas para detectar sinonímias e erros ortográficos, alteração do formato de datas e horas, e aplicação de um vocabulário controlado para os restantes campos. Paralelamente a este processo, foram testadas ferramentas para acelerar a digitalização em duas fases diferentes: transcrição e georreferenciação de dados a partir de etiquetas de espécimes. Foram testadas cinco ferramentas de georreferenciação que disponibilizam Application Programming Interfaces (APIs), que podem ser usadas para georreferenciar registos automaticamente a partir de nomes de localidades: Google Maps, MapQuest, GeoNames, OpenStreetMap e GEOLocate. Destes, a ferramenta Google Maps foi a que produziu melhores resultados, com 57.6% dos resultados a uma distância de 1000 m ou menos do local correcto. Foi também desenvolvido e testado um projecto de ciência cidadã na plataforma Zooniverse, que contemplou duas vertentes: uma de transcrição de dados a partir de fotografias de espécimes com etiquetas, direccionada ao público geral, e uma de identificação taxonómica de espécimes a partir de fotografias, direcionada a especialistas na taxonomia do respectivo grupo. A primeira vertente resultou na transcrição com sucesso dos dados de todos os 130 espécimes disponibilizados. A segunda resultou na identificação dos 61 espécimes que não tinham identificação prévia, e na verificação das identificações dos restantes 69 espécimes. Conclui-se, portanto, que os projectos de ciência cidadã serão uma boa maneira de acelerar o projecto de digitalização, desde que sejam implementados métodos de verificação e correcção de erros adequados. Por forma a focar todos os passos do processo de digitalização de uma forma mais completa, foram selecionadas as colecções de tabanídeos (Diptera: Tabanidae) do IICT e do MNHNC. Este grupo é de especial interesse por incluir importantes vectores de transmissão de doenças a humanos e gado, e por ter uma distribuição ampla em todo o Mundo. A colecção de tabanídeos do IICT é particularmente importante por ter sido, na sua maioria, compilada e estudada por J. A. Travassos Santos Dias, um especialista neste grupo que publicou extensos trabalhos com base nos espécimes da colecção. Ambas as colecções incluem espécimes tipo de espécies descritas por Travassos Santos Dias e outros autores. Apesar da sua importância, a informação associada aos espécimes das colecções do IICT/MNHNC ainda não estava digitalizada. Neste trabalho, foram fotografados todos os espécimes e transcritos os seus dados, resultando num dataset com 1 666 exemplares. Foi feita a georreferenciação dos registos sempre que possível. Os espécimes da colecção foram recolhidos entre 1899 e 2018, maioritariamente em Portugal, mas também em São Tomé e Príncipe, Guiné-Bissau, Moçambique, Espanha e outros países. Para uma melhor visualização da distribuição geográfica dos espécimes, foram criados mapas de distribuição, recorrendo a R, para as espécies mais bem representadas nas colecções. A publicação deste dataset na plataforma GBIF será uma mais-valia para o estudo da distribuição deste grupo, devido à sua ampla cobertura geográfica e temporal, bem como ao facto da maioria dos espécimes (85.1%) estarem identificados até à espécie ou subespécie.This work is based on data records associated with the insect Collections from the Museu Nacional de História Natural e da Ciência (MNHNC) and Instituto de Investigação Científica Tropical (IICT), Universidade de Lisboa. In 2014 a dataset with 30 535 records was published in the Global Biodiversity Information Facility (GBIF). Since then data has been improved and new records acquired. Currently, the collection catalogue includes 39 139 validated records, corresponding to 79 885 specimens, with much more to be added from collections donated by private collectors or unprocessed samples. The data for these specimens was cleaned, formatted and geocoded and published on the GBIF. During this work, different APIs were tested to allow automated geocoding of sampling locations. Google Maps achieved the best results, with 57.6% of results within 1000 m of the correct location. A citizen science project was developed and tested to accelerate the digitization process, including two workflows with different objectives. One was focused on the transcription of specimen label data, which resulted in the data for 130 specimens being successfully transcribed. The other was focused on the taxonomic identification of specimens from photographs, directed to specialists in the respective group’s taxonomy, which resulted in 61 new identifications and the verification of identifications for the remaining 69 specimens. The MNHNC and IICT collections contain collections of horseflies (Order Diptera, Family Tabanidae) which are of particular importance due to its size and completeness of associated data. Horseflies are widely distributed worldwide and are important vectors in transmission of diseases to humans and cattle. The IICT collection includes a sub-collection which was compiled and studied by J. A. Travassos Santos Dias, a prominent specialist in this group. The specimens in these collections were photographed, all the associated data were transcribed, taxonomic identifications were verified and records were geocoded, resulting in a dataset of 1666 specimens. These specimens were collected between 1899 and 2018, mainly in Portugal, but also in São Tomé and Príncipe, Guinea-Bissau, Mozambique, Spain and other countries. To better understand the distribution of this group, distribution maps were made for the most well-represented species in the collections

Universidade de Lisboa: Repositório.UL

Recommended from our members

Making digital history: The impact of digitality on public participation and scholarly practices in historical research

Author: Ridge Mia
Publication venue
Publication date: 29/06/2016
Field of study

This thesis investigates tow key questions: firstly, how do two broad groups - academic, family and local historians, and the public - evaluate, use, and contribute to digital history resources? And consequently, what impact have digital technologies had on public participation and scholarly practices in historical research? Analysing the impact of design on participant experiences and the reception of digital historiography by demonstrating the value of methods drawn from human-computer interaction, including heuristic evaluation, trace ethnography and semi-structured interviews. This thesis also investigates the relationship between heritage crowdsourcing projects (which ask the public to help with meaningful, inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge) and the development of historical skills and interests. It situates crowdsourcing and citizen history within the broader field of participatory digital history and then focuses on the impact of digitality on the research practices of faculty and community historians. Chapter 1 provides an overview of over 400 digital history projects aimed at engaging the public or collecting, creating or enhancing records about historical materials for scholarly and general audiences. Chapter 2 discusses design factors that may influence the success of crowdsourcing projects. Following this, Chapter 3 explores the ways in which some crowdsourcing projects encourage deeper engagement with history or science, and the role of communities of practice in citizen history. Chapter 4 shifts our focus from public participation to scholarly practices in historical research, presenting the results of interviews conducted with 29 faculty and community historians. Finally, the Conclusion draws together the threads that link public participation and scholarly practices, teasing out the ways in which the practices of discovering, gathering, creating and sharing historical materials and knowledge have been affected by digital methods, tools and resources

Open Research Online (The Open University)

Crowding the library : how and why libraries are using crowdsourcing to engage the public

Author: Sauve Jean-Sébastien
Severson Sarah
Publication venue: 'University of Guelph'
Publication date: 15/07/2019
Field of study

Over the past 10 years, there has been a noticeable increase of crowdsourcing projects in cultural heritage institutions, where digital technologies are being used to open up their collections and encourage the public to engage with them in a very direct way. Libraries, archives and museums have long had a history and mandate of outreach and public engagement but crowdsourcing marks a move towards a more participatory and inclusive model of engagement. If a library wants to start a crowdsourcing project, what do they need to know? This article is written from a Canadian University library perspective with the goal to help the reader engage with the current crowdsourcing landscape. This article’s contribution includes a literature review and a survey of popular projects and platforms; followed by a case study of a crowdsourcing pilot completed at the McGill Library. The article pulls these two threads of theory and practice together—with a discussion of some of the best practices learned through the literature and real-life experience, giving the reader practical tools to help a library evaluate if crowdsourcing is right for them, and how to get a desired project off the ground

Dépôt Institutionnel Numérique

Crowdsourcing Natural History Archives: Tools for Extracting Transcriptions and Data

Author: De Veer Joseph
Mika Katherine A.
Rinaldo Constance
Publication venue: 'The University of Kansas'
Publication date: 01/11/2017
Field of study

This paper is a survey of the landscape of current, successful, and innovative platforms for extracting full text transcriptions and structured data using crowdsourcing as a tool. Archival manuscript items are the key case studies reviewed to develop a set of tools for the Biodiversity Heritage Library for use in enhancing access to and extraction information from items in scientific and natural history collections

Directory of Open Access Journals

The University of Kansas: Journals@KU

Biodiversity Informatics

The implications of handwritten text recognition for accessing the past at scale

Author: Gooding Paul
Nockels Joe
Terras Melissa
Publication venue
Publication date: 18/04/2024
Field of study

Before Handwritten Text Recognition (HTR), manuscripts were costly to convert to machine-processable text for research and analysis. With HTR now achieving high levels of accuracy, we ask what near-future behaviour, interaction, experience, values and infrastructures may occur when HTR is applied to historical documents? When combined with mass-digitisation of GLAM (galleries, libraries, archives and museums) content, how will HTR’s application, use, and affordances generate new knowledge of the past, and affect our information environment? This paper’s findings emerge from a literature review surveying current understanding of the impact of HTR, to explore emerging issues over the coming decade. We aim to deconstruct the simplistic narrative that the speed, efficiency, and scale of HTR will “transform scholarship in the archives” (Muehlberger et al., 2019: 955), providing a more nuanced consideration of its application, possibilities, and opportunities. In doing so, our recommendations will assist researchers, data and platform providers, memory institutions and data scientists to understand how the results of HTR interact with the wider information environment.We find that HTR supports the creation of accurate transcriptions from historical manuscripts, and the enhancement of existing datasets. HTR facilitates access to a greater range of materials, including endangered languages, enabling a new focus on personal and private materials (diaries, letters), increasing access to historical voices not usually incorporated into the historical record, and increasing the scale and heterogeneity of available material. The production of general training models leads to a virtuous digitisation circle where similar datasets are easier – and therefore more likely – to be produced. This leads to the requirement for processes that will facilitate the storage, and discoverability of HTR generated content, and for memory institutions to rethink search and access to collections. Challenges include HTR’s dependency on digitisation, its relation to archival history and omission, and the entrenchment of bias in data sources. The paper details several near future issues, including: the potential of HTR for the basis of automated metadata extraction; the integration of advanced Artificial Intelligence (AI) processes (including Large Language Models (LLMs) and generative AI) into HTR systems; legal and moral issues such as copyright, privacy and data ethics which are challenged by the use of HTR; how individual contributions to shared HTR models can be credited; and the environmental costs of HTR infrastructure. We identify the need for greater collaboration between communities including historians, information scientists, and data scientists to navigate these issues, and for further skills support to allow non-specialist audiences to make the most of HTR. Data literacy will become increasingly important, as will building frameworks to establish data sharing, data consent, and reuse principles, particularly in building open repositories to share models and datasets. Finally, we suggest that an understanding of how HTR is changing the information environment is a crucial aspect of future technological development. <br/

Edinburgh Research Explorer

Recommended from our members

Meteorological data rescue: citizen science lessons learned from Southern Weather Discovery

Author: Allan Rob
Compo Gilbert P.
Hawkins Ed
Judd Emily
Lorrey Andrew M.
Mackay Stuart
Pearce Petra R.
Quesnel Patrick
Rawhat Sudhir
Slivinski Laura
Wilkinson Clive
Wilkinson Sally
Woolley John-Mark
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Daily weather reconstructions (called "reanalyses") can help improve our understanding of meteorology and long-term climate changes. Adding undigitized historical weather observations to the datasets that underpin reanalyses is desirable; however, time requirements to capture those data from a range of archives is usually limited. Southern Weather Discovery is a citizen science data rescue project that recovered tabulated handwritten meteorological observations from ship log books and land-based stations spanning New Zealand, the Southern Ocean, and Antarctica. We describe the Zooniverse-hosted Southern Weather Discovery campaign, highlight promotion tactics, and replicate keying levels needed to obtain 100% complete transcribed datasets with minimal type 1 and type 2 transcription errors. Rescued weather observations can augment optical character recognition (OCR) text recognition libraries. Closer links between citizen science data rescue and OCR-based scientific data capture will accelerate weather reconstruction improvements, which can be harnessed to mitigate impacts on communities and infrastructure from weather extremes

Central Archive at the University of Reading

CU Scholar Institutional Repository

PubMed Central

Draft: Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects

Author: Mia Ridge
Publication venue: 'Modern Language Association'
Publication date: 01/01/2021
Field of study

Have you ever wanted to recruit hundreds of members of the public to assist with the task of making cultural heritage collections findable online? Or to connect with passionate volunteers who'll share their discoveries with you? Crowdsourcing in cultural heritage is a broad term for projects that ask the public to help with tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge. As participants receive no financial reward, the activities and/or goals should be inherently rewarding for those volunteering their time. This definition is partly descriptive and partly proscriptive, and this chapter is largely concerned with explaining/describing how to meet the standards it implies. [A draft (not quite pre-print) version of my chapter for the Routledge International Handbook of Research Methods in Digital Humanities, edited by Kristen Schuster, Stuart Dunn, 2021. ISBN 9781138363021

Humanities Commons

Methods of Building Sustainable Digital Communities and Co-Productivity from Crowdsourcing in the GLAM Sector

Author: Johnstone Andrew
Publication venue
Publication date: 01/06/2023
Field of study

King's Research Portal

Recommended from our members

Building data into knowledge: Identifying challenges and their solutions in biodiversity informatics

Author: Hill Andrew William
Publication venue: University of Colorado Boulder
Publication date: 01/01/2012
Field of study

Biologists are in a race to document biodiversity in the face of ailing ecosystems and species decline. The drive to create knowledge to support effective documentation, measurement, and conservation of biodiversity has led the community to quickly research and develop methods to organize and connect biodiversity data across providers and throughout the world. Biodiversity data came online through distributed and disconnected databases but through time has been shaped into a biodiversity network that now represents nearly 500 million biodiversity records. The ability to access these data has brought exciting new research and new challenges. In this thesis I discuss my work to solve some of those challenges and build innovative approaches and tools for biodiversity informatics. I start by documenting tools that help improve the quality and fitness for use of data. Then I present two tools for visualizing and analyzing data in a phylogenetic and conservation context. More importantly, I discuss how designing these tools to operate within a greater knowledge creation framework can make the work of documenting patterns and processes in biodiversity faster and more resilient to future changes and improved information. At the heart of that discussion is the idea that the outputs of the tools themselves should be published and directly linked back to the original data and forward to any future analyses. The outputs should also document all models, parameters, and heuristics used do arrive at the reported outcome. In this way, both the data and our research of that data can be woven into a connected fabric of knowledge and information that links biodiversity and the digital data stored in our databases. Finally, I discuss the possibility we have for expanding our biodiversity data and improving the research we can do with it through the use of citizen science. The data available today is still deficient. Natural history collections hold a wealth of data that has not yet been digitized, but as a community we lack the resources to unlock that data quickly without a novel solution. Citizen science offers us the ability to quickly generate historical biodiversity data from natural history collections. We present a novel platform for engaging citizen scientists and developing a shared, community driven, platform to harness the potential of citizen science

CU Scholar Institutional Repository