The Netherlands Biodiversity Data Services and the R package nbaR: Automated workflows for biodiversity data analysis

Abstract

The value of data present in natural history collections for research in biodiversity, ecology and evolution cannot be overstated. Naturalis Biodiversity Center of the Netherlands, home to one of the largest natural history collections in the world, launched a large-scale digitisation project resulting in the registration of more than 38 million specimen objects, many of them annotated with descriptive metadata, such as geographic coordinates or multimedia content. Other resources hosted at Naturalis include species occurrence records and comprehensive taxonomic checklists, such as the Catalogue of Life. As our institution strongly believes in the Open Science paradigm, we seek to make our data available to the global biodiversity research community, enhancing data analysis workflows, as for example (i) the modelling of present, past and future species distributions using specimen occurrence data, (ii) time calibration of (molecular) phylogenies using dated specimen occurrences, (iii) taxonomic name resolution or (iv) image data mining. To this end, we developed the Netherlands Biodiversity Data services [1], providing centralized access to biodiversity data via state of the art, open access interfaces and a mechanism to assign persistent identifiers to all records. Data are retrieved from heterogeneous sources and harmonized into a document store that complies with international data standards such as ABCD (Access to Biological Collection Data [2]). Employing the Elasticsearch engine, our infrastructure features complex query options, near real-time queries, and scaling possibilities to secure foreseen data growth. Focusing on availability and accessibility, the services were designed as a versatile, low-level REST API to allow the use of our data in a broad variety of applications and services. For programmatic access to our data services, we developed client libraries for several programming languages. Here we present the R package ‘nbaR’ [3], a client especially targeted to an audience of biodiversity researchers. The R programming language has found wide acceptance in this field over the past years and our package facilitates convenient means to connect our data resources to existing tools for statistical modelling and analysis. The abstraction layer introduced by the client lets the user formulate even complex queries in a convenient manner, thereby lowering the access threshold to our data services. We will demonstrate the potential and benefits of services and R client by integrating nbaR with state-of-the art packages for species distribution modelling and time calibration of phylogenetic trees into a single analysis workflow. 1. Netherlands Biodiversity Data services – User documentation. http://docs.biodiversitydata.nl (accessed 17 May 2018). 2. Access to Biological Collections Data task group. 2007. Access to Biological Collection Data (ABCD), Version 2.06. Biodiversity Information Standards (TDWG) http://www.tdwg.org/standards/115 (accessed 17 May 2018). 3. nbaR GitHub repository. https://github.com/naturalis/ nbaR (accessed 17 May 2018)

    Similar works