36 research outputs found
Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
Abstract Background Increasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heterogeneous and not in a map-ready format. The biodiversity informatics community has developed best practices and tools that provide the means to do retrospective georeferencing (e.g., the BioGeomancer toolkit), a process that converts heterogeneous descriptions into geographic coordinates and a measurement of spatial uncertainty. Even with these methods and tools, data publishers are faced with the immensely time-consuming task of vetting georeferenced localities. Furthermore, it is likely that overlap in georeferencing effort is occurring across data publishers. Solutions are needed that help publishers more effectively georeference their records, verify their quality, and eliminate the duplication of effort across publishers. Results We have developed a tool called BioGeoBIF, which incorporates the high throughput and standardized georeferencing methods of BioGeomancer into a beginning-to-end workflow. Custodians who publish their data to the Global Biodiversity Information Facility (GBIF) can use this system to improve the quantity and quality of their georeferences. BioGeoBIF harvests records directly from the publishers' access points, georeferences the records using the BioGeomancer web-service, and makes results available to data managers for inclusion at the source. Using a web-based, password-protected, group management system for each data publisher, we leave data ownership, management, and vetting responsibilities with the managers and collaborators of each data set. We also minimize the georeferencing task, by combining and storing unique textual localities from all registered data access points, and dynamically linking that information to the password protected record information for each publisher. Conclusion We have developed one of the first examples of services that can help create higher quality data for publishers mediated through the Global Biodiversity Information Facility and its data portal. This service is one step towards solving many problems of data quality in the growing field of biodiversity informatics. We envision future improvements to our service that include faster results returns and inclusion of more georeferencing engines
Image based Digitisation of Entomology Collections: Leveraging volunteers to increase digitization capacity
In 2010, the Australian Museum commenced a project to explore and develop ways for engaging volunteers to increase the rate of digitising natural history collections. The focus was on methods for image-based digitising of dry pinned entomology collections. With support from the Atlas of Living Australia, the Australian Museum developed a team of volunteers, training materials and processes and procedures.Project officers were employed to coordinate the volunteer workforce. Digitising workstations were established with the aim of minimising cost whilst maximising productivity and ease of use. Database management and curation of material before digitisation, were two areas that required considerably more effort than anticipated.Productivity of the workstations varied depending on the species group being digitised. Fragile groups took longer, and because digitising rates vary among the volunteers, the average hourly rate for digitising pinned entomological specimens (cicadas, leafhoppers, moths, beetles, flies) varied between 15 to 20 per workstation per hour, which compares with a direct data entry rate of 18 per hour from previous trials.Four specimen workstations operated four days a week, five hours a day, by a team of over 40 volunteers. Over 5 months, 16,000 specimens and their labels were imaged and entered as short records into the museum’s collection management database
Inferring the age and environmental characteristics of fossil sites using citizen science.
Not all fossil sites preserve microfossils that can be extracted using acid digestion, which may leave knowledge gaps regarding a site's age or environmental characteristics. Here we report on a citizen science approach that was developed to identify microfossils in situ on the surface of sedimentary rocks. Samples were collected from McGraths Flat, a recently discovered Miocene rainforest lake deposit located in central New South Wales, Australia. Composed entirely of iron-oxyhydroxide, McGraths Flat rocks cannot be processed using typical microfossil extraction protocols e.g., acid digestion. Instead, scanning electron microscopy (SEM) was used to automatically acquire 25,200 high-resolution images from the surface of three McGraths Flat samples, covering a total area of 1.85 cm2. The images were published on the citizen science portal DigiVol, through which 271 citizen scientists helped to identify 300 pollen and spores. The microfossil information gained in this study is biostratigraphically relevant and can be used to constrain the environmental characteristics of McGraths Flat. Our findings suggest that automated image acquisition coupled with an evaluation by citizen scientists is an effective method of determining the age and environmental characteristics of fossiliferous rocks that cannot be investigated using traditional methods such as acid digestion
Advancing the productivity of science with citizen science and artificial intelligence
International audienceCitizen science is a form of scientific inquiry where members of the public engage in scientific investigations, often in collaboration with, or under the direction of, professional scientists and scientific institutions. It supports scientific research and applied sciences through a wide range of activities and across diverse topics. Thanks to advances in communication and computing technologies, the public can collaboratively participate in new ways in citizen science projects. For example, participants submit observations and samples about the environment via eBird, iNaturalist or the EchidnaCSI project, among other platforms. They also engage on line by transcribing historical documents or classifying photographs, audio and video via platforms such as DigiVol or Zooniverse. In other cases, participants collaboratively solve mathematical problems via the Polymath Project, or play online games via Foldit, to inform medical research. The public disseminates project outcomes as well
Can Biodiversity Data Scientists Document Volunteer and Professional Collaborations and Contributions in the Biodiversity Data Enterprise?
The collection, archiving and use of biodiversity data depend on a network of pipelines herein called the Biodiversity Data Enterprise (BDE) and best understood globally through the work of the Global Biodiversity Information Facility (GBIF). Efforts to sustain and grow the BDE require information about the data pipeline and the infrastructure that supports it. A host of metrics from GBIF, including institutional participation (member countries, institutional contributors, data publishers), biodiversity coverage (occurrence records, species, geographic extent, data sets) and data usage (records downloaded, published papers using the data) (Miller 2021), document the rapid growth and successes of the BDE (GBIF Secretariat 2022). Heberling et al. (2021) make a convincing case that the data integration process is working.The Biodiversity Information Standards' (TDWG) Basis of Record term provides information about the underlying infrastructure. It categorizes the kinds of processes*1 that teams undertake to capture biodiversity information and GBIF quantifies their contributions*2 (Table 1). Currently 83.4% of observations come from human observations, of which 63% are of birds. Museum preserved specimens account for 9.5% of records. In both cases, a combination of volunteers (who make observations, collect specimens, digitize specimens, transcribe specimen labels) and professionals work together to make records available.To better understand how the BDE is working, we suggest that it would be of value to know the number of contributions and contributors and their hours of engagement for each data set. This can help the community address questions such as, "How many volunteers do we need to document birds in a given area?" or "How much professional support is required to run a camera trap network?" For example, millions of observations were made by tens of thousands of observers in two recent BioBlitz events, one called Big Day, focusing on birds, sponsored by the Cornell Laboratory of Ornithology and the other called the City Nature Challenge, addressing all taxa, sponsored jointly by the California Academy of Sciences and the Natural History Musuems of Los Angeles County (Table 2). In our presentation we will suggest approaches to deriving metrics that could be used to document the collaborations and contribution of volunteers and staff using examples from both Human Observation (eBird, iNaturalist) and Preserved Specimen (DigiVol, Notes from Nature) record types. The goal of the exercise is to start a conversation about how such metrics can further the development of the BDE
Breakdown of total number of images transcribed by volunteers and expert reviewed.
Total number of images in which volunteers reached agreement or did not reach agreement on the questionnaire template. Total number of images that needed expert review, as well as images verified to contain pollen/spores and final pollen/spore counts. (PDF)</p
Flow chart outlining the identification and verification steps involved in the analysis of SEM images.
Citizen scientist (volunteers) microfossil identification reduced the SEM image dataset from 25,200 to 4192. Experts reviewed the 4192 images and verified microfossils in 448 images. Accounting for image overlap, the final pollen and spore count in these verified images was 383, of which 300 specimens were identifiable. Key for superscript lettering: a—An ‘agreement’ was reached by a minimum of three volunteers on questions set out in the questionnaire template; this category was then subdivided into images indicated as having ‘no microfossil/s’ and images indicated as having ‘microfossil/s’. b—These images were undisputed (i.e., not disputed by a fourth volunteer). c—These images were disputed by an additional (fourth) volunteer and had to be reviewed by an expert. d—These images were identified by three volunteers as containing other microfossils, that are not pollen or spores. e—If ‘no agreement’ was reached by a minimum of three volunteers on questions set out in the questionnaire template, these images were automatically marked by the system to be reviewed.</p
Reasons for no agreement on images.
Venn diagram illustrating the number of times that questions related to specimen count, occurrence, position, name, and image focus on the questionnaire template led to ‘no agreement’ among volunteer citizen scientists. The analysis is based on 275 images that were expert verified as containing pollen or spores (see Fig 3 for details). The top reasons that resulted in ‘no agreement’ included a combination of questions (count, occurrence, position, and name = 78 images), followed by identification of the specimen (75 images). Occurrence—whether or not a microfossil is present in an image; count—number of pollen or spores in an image; position—placement of microfossil on the edge or middle of the image; name—selected from a range of listed specimens; and focus—image is in focus or out of focus.</p