The value of data present in natural history collections for research in biodiversity, ecology and evolution
cannot be overstated. Naturalis Biodiversity Center of the Netherlands, home to one of the largest natural
history collections in the world, launched a large-scale digitisation project resulting in the registration of more
than 38 million specimen objects, many of them annotated with descriptive metadata, such as geographic
coordinates or multimedia content. Other resources hosted at Naturalis include species occurrence records
and comprehensive taxonomic checklists, such as the Catalogue of Life. As our institution strongly believes
in the Open Science paradigm, we seek to make our data available to the global biodiversity research
community, enhancing data analysis workflows, as for example (i) the modelling of present, past and future
species distributions using specimen occurrence data, (ii) time calibration of (molecular) phylogenies using
dated specimen occurrences, (iii) taxonomic name resolution or (iv) image data mining. To this end, we
developed the Netherlands Biodiversity Data services [1], providing centralized access to biodiversity data
via state of the art, open access interfaces and a mechanism to assign persistent identifiers to all records.
Data are retrieved from heterogeneous sources and harmonized into a document store that complies with
international data standards such as ABCD (Access to Biological Collection Data [2]). Employing the
Elasticsearch engine, our infrastructure features complex query options, near real-time queries, and scaling
possibilities to secure foreseen data growth. Focusing on availability and accessibility, the services were
designed as a versatile, low-level REST API to allow the use of our data in a broad variety of applications
and services. For programmatic access to our data services, we developed client libraries for several
programming languages. Here we present the R package ‘nbaR’ [3], a client especially targeted to an
audience of biodiversity researchers. The R programming language has found wide acceptance in this field
over the past years and our package facilitates convenient means to connect our data resources to existing
tools for statistical modelling and analysis. The abstraction layer introduced by the client lets the user
formulate even complex queries in a convenient manner, thereby lowering the access threshold to our data
services. We will demonstrate the potential and benefits of services and R client by integrating nbaR with
state-of-the art packages for species distribution modelling and time calibration of phylogenetic trees into a
single analysis workflow.
1. Netherlands Biodiversity Data services – User documentation. http://docs.biodiversitydata.nl (accessed 17 May
2018).
2. Access to Biological Collections Data task group. 2007. Access to Biological Collection Data (ABCD), Version 2.06.
Biodiversity Information Standards (TDWG) http://www.tdwg.org/standards/115 (accessed 17 May 2018).
3. nbaR GitHub repository. https://github.com/naturalis/ nbaR (accessed 17 May 2018)