CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases
Authors
Andermann T
Antonelli A
+12 more
Ariza M
Azevedo J
Duarte Ritter C
Edler D
Farooq H
Herdean A
Scharn R
Silvestro D
Svantesson S
Wengström N
Zizka A
Zizka V
Publication date
4 June 2020
Publisher
'Wiley'
Doi
Cite
Abstract
© 2019 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society. Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect geo-referencing or dating, can diminish their usefulness. Manual cleaning is time-consuming, error prone, difficult to reproduce and limited to known geographical areas and taxonomic groups, making it impractical for datasets with thousands or millions of records. Here, we present CoordinateCleaner, an r-package to scan datasets of species occurrence records for geo-referencing and dating imprecisions and data entry errors in a standardized and reproducible way. CoordinateCleaner is tailored to problems common in biological and palaeontological databases and can handle datasets with millions of records. The software includes (a) functions to flag potentially problematic coordinate records based on geographical gazetteers, (b) a global database of 9,691 geo-referenced biodiversity institutions to identify records that are likely from horticulture or captivity, (c) novel algorithms to identify datasets with rasterized data, conversion errors and strong decimal rounding and (d) spatio-temporal tests for fossils. We describe the individual functions available in CoordinateCleaner and demonstrate them on more than 90 million occurrences of flowering plants from the Global Biodiversity Information Facility (GBIF) and 19,000 fossil occurrences from the Palaeobiology Database (PBDB). We find that in GBIF more than 3.4 million records (3.7%) are potentially problematic and that 179 of the tested contributing datasets (18.5%) might be biased by rasterized coordinates. In PBDB, 1205 records (6.3%) are potentially problematic. All cleaning functions and the biodiversity institution database are open-source and available within the CoordinateCleaner r-package
Similar works
Full text
Available Versions
OPUS - University of Technology Sydney
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:opus.lib.uts.edu.au:10453/...
Last time updated on 20/06/2020