26 research outputs found

    Recognizability bias in citizen science photographs

    Get PDF
    Citizen science and automated collection methods increasinglydepend on image recognition to provide the amountsof observational data research and management needs.Recognition models, meanwhile, also require large amounts ofdata from these sources, creating a feedback loop between themethods and tools. Species that are harder to recognize, bothfor humans and machine learning algorithms, are likely to beunder-reported, and thus be less prevalent in the trainingdata. As a result, the feedback loop may hamper trainingmostly for species that already pose the greatest challenge. Inthis study, we trained recognition models for various taxa, andfound evidence for a‘recognizability bias’, where species thatare more readily identified by humans and recognitionmodels alike are more prevalent in the available image data.This pattern is present across multiple taxa, and does notappear to relate to differences in picture quality, biologicaltraits or data collection metrics other than recognizability. Thishas implications for the expected performance of futuremodels trained with more data, including such challenging species. citizen science, image recognition, machinelearning, recognizability, artificial intelligence/environmental science/ecology, Ecology, conservation and global change biologypublishedVersio

    COOD:Combined out-of-distribution detection using multiple measures for anomaly & novel class detection in large-scale hierarchical classification

    Get PDF
    High-performing out-of-distribution (OOD) detection, both anomaly and novel class, is an important prerequisite for the practical use of classification models. In this paper, we focus on the species recognition task in images concerned with large databases, a large number of fine-grained hierarchical classes, severe class imbalance, and varying image quality. We propose a framework for combining individual OOD measures into one combined OOD (COOD) measure using a supervised model. The individual measures are several existing state-of-the-art measures and several novel OOD measures developed with novel class detection and hierarchical class structure in mind. COOD was extensively evaluated on three large-scale (500k+ images) biodiversity datasets in the context of anomaly and novel class detection. We show that COOD outperforms individual, including state-of-the-art, OOD measures by a large margin in terms of TPR@1% FPR in the majority of experiments, e.g., improving detecting ImageNet images (OOD) from 54.3% to 85.4% for the iNaturalist 2018 dataset. SHAP (feature contribution) analysis shows that different individual OOD measures are essential for various tasks, indicating that multiple OOD measures and combinations are needed to generalize. Additionally, we show that explicitly considering ID images that are incorrectly classified for the original (species) recognition task is important for constructing high-performing OOD detection methods and for practical applicability. The framework can easily be extended or adapted to other tasks and media modalities

    Deep Learning from Label Proportions for Emphysema Quantification

    Full text link
    We propose an end-to-end deep learning method that learns to estimate emphysema extent from proportions of the diseased tissue. These proportions were visually estimated by experts using a standard grading system, in which grades correspond to intervals (label example: 1-5% of diseased tissue). The proposed architecture encodes the knowledge that the labels represent a volumetric proportion. A custom loss is designed to learn with intervals. Thus, during training, our network learns to segment the diseased tissue such that its proportions fit the ground truth intervals. Our architecture and loss combined improve the performance substantially (8% ICC) compared to a more conventional regression network. We outperform traditional lung densitometry and two recently published methods for emphysema quantification by a large margin (at least 7% AUC and 15% ICC), and achieve near-human-level performance. Moreover, our method generates emphysema segmentations that predict the spatial distribution of emphysema at human level.Comment: Accepted to MICCAI 201

    Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors

    Get PDF
    Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure. In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples. We consider this to be the most realistic scenario for MedIA systems. Firstly, we study the effect of weight initialization (ImageNet vs. random) on the transferability of adversarial attacks from the surrogate model to the target model. Secondly, we study the influence of differences in development data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks. Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when developing security-critical MedIA systems planned to be deployed in clinical practice.Comment: First three authors contributed equall

    Supporting citizen scientists with automatic species identification using deep learning image recognition models

    No full text
    Volunteers, researchers and citizen scientists are important contributors to observation and monitoring databases. Their contributions thus become part of a global digital data pool, that forms the basis for important and powerful tools for conservation, research, education and policy. With the data contributed by citizen scientists also come concerns about data completeness and quality. For data generated by citizen scientists taxonomic bias effects, where certain species (groups) are underrepresented in observations, are even stronger than for professionally collected data. Identification tools that help citizen scientists to access more difficult, underrepresented groups, can help to close this gap. We are exploring the possibilities of using artificial intelligence for automatic species identification as a tool to support the registration of field observations. Our aim is to offer nature enthusiasts the possibility of automatically identifying species, based on photos they have taken as part of an observation. Furthermore, by allowing them to register these identifications as part of the observation, we aim to enhance the completeness and quality of the observation database. We will demonstrate the use of automatic species recognition as part of the process of observation registration, using a recognition model that is based on deep learning techniques. We investigated the automatic species recognition using deep learning models trained with observation data of the popular website Observation.org (https://observation.org/). At Observation.org data quality is ensured by a review process of all observations by experts. Using the pictures and corresponding validated metadata from their database, models were developed covering several species groups. These techniques were based on earlier work that culminated in ObsIdentify, an free offline mobile app for identifying species based on pictures taken in the field. The models are also made available as an API web service, which allows for identification by offering a photo through common HTTP-communication - essentially like uploading it through a webpage. This web service was implemented in the observation entry workflows of Observation.org. By providing an automatically generated taxonomic identification with each image, we expect to stimulate existing citizen scientists to generate a larger quantity of and more biodiverse observations. Additionally we hope to motivate new citizen scientists to start contributing. Additionally, we investigated the use of image recognition for the identification of additional species in the photo other than the primary subject, for example the identification of the host plant in photos of insects. The Observation.org database contains many of such photos which are associated with a single species observation, while additional, other species are also present in the photo, but are unidentified. Combining object detection to detect individual species with species recognition models opens up the possibility of automatically identifying and counting these species, enhancing the quality of the observations. In the presentation we will present the initial results of this application of deep learning technology, and discuss the possibilities and challenges

    Species Distribution Modelling Using Deep Learning

    No full text
    Species distribution modelling, or ecological niche modelling, is a collection of techniques for the construction of correlative models based on the combination of species occurrences and GIS data. Using such models, a variety of research questions in biodiversity science can be investigated, among which are the assessment of habitat suitability around the globe (e.g. in the case of invasive species), the response of species to alternative climatic regimes (e.g. by forecasting climate change scenarios, or by hindcasting into palaeoclimates), and the overlap of species in niche space. The algorithms used for the construction of such models include maximum entropy, neural networks, and random forests. Recent advances both in computing power and in algorithm development raise the possibility that deep learning techniques will provide valuable additions to these existing approaches. Here, we present our recent findings in the development of workflows to apply deep learning to species distribution modelling, and discuss the prospects for the large-scale application of deep learning in web service infrastructures to analyze the growing corpus of species occurrence data in biodiversity information facilities

    Using Deep Learning in Collection Management to Reduce the Taxonomist’s Workload

    No full text
    The completeness and quality of the information in natural history museum collections is essential to support its use, such as in collection management. Currently, the accuracy of the taxonomic information largely depends on expert provided metadata, such as species identification. At present an increase in the use of digitization techniques coincides with a dwindling of the number of taxonomic specialists, creating a growing backlog in specimen identifications. We are investigating the role of artificial intelligence for automatic species identification in supporting collection management. When identifying collection specimens, common species are predominantly present, taking up a large amount of the expert's time, who has to deal with a relatively easy, repetitive task. Therefore, one of our aims is to use human expertise where it is most needed, for complex tasks, and use properly validated computational methods for repetitive, less difficult identifications. To this end, we demonstrate the use of automatic species identification in digitization workflows, using deep learning based image recognition. We investigated potential gains in the identification process of a large digitization project of papered Lepidoptera (>500,000 specimens). In this ongoing project, volunteers unpack, register and photograph the unmounted butterflies and repack them sustainably, still unmounted. Using only the individual images made by volunteers, taxonomic experts identify the specimens. Considering that the speed of digitization currently exceeds that of identification, a growing backlog of yet-to-be-identified specimens has formed, limiting the speed of publication of this biodiversity information. The test case for image recognition concerns specimens of the families Papilionidae and Lycaenidae, mostly collected in Indonesia. By allowing the volunteers to provide an automatically generated identification with each image, we enable the taxonomic specialists to quickly validate the more easily identifiable specimens. This reduces their workload, allows them to focus on the more demanding specimens and increases the rate of specimen identification. We demonstrate how to combine computer and human decisions to ensure both high data quality standards and reduction of expert time

    Smart Insect Cameras

    No full text
    Recent studies have shown a worrying decline in the quantity and diversity of insects at a number of locations in Europe (Hallmann et al. 2017) and elsewhere (Lister and Garcia 2018). Although the downward trend that these studies show is clear, they are limited to certain insect groups and geographical locations. Most available studies (see overview in Sánchez-Bayo and Wyckhuys 2019) were performed in nature reserves, leaving rural and urban areas largely understudied. Most studies are based on the long-term collaborative efforts of entomologists and volunteers performing labor-intensive repeat measurements, inherently limiting the number of locations that can be monitored. We propose a monitoring network for insects in the Netherlands, consisting of a large number of smart insect cameras spread across nature, rural, and urban areas. The aim of the network is to provide a labor-extensive continuous monitoring of different insect groups. In addition, we aimed to develop the cameras at a relatively cheap price point so that cameras can be installed at a large number of locations and encourage participation by citizen science enthusiasts. The cameras are made smart with image processing, consisting of image enhancement, insect detection and species identification being performed, using deep learning based algorithms. The cameras take pictures of a screen, measuring ca. 30×40 cm, every 10 seconds, capturing insects that have landed on the screen (Fig. 1). Several screen setups were evaluated. Vertical screens were used to attract flying insects. Different screen colors and lighting at night, to attract night flying insects such as moths, were used. In addition two horizontal screen orientations were used (1) to emulate pan traps to attract several pollinator species (bees and hoverflies) and (2) to capture ground-based insects and arthropods such as beetles and spiders. Time sequences of images were analyzed semi-automatically, in the following way. First, single insects are outlined and cropped using boxes at every captured image. Then the cropped single insects in every image were preliminarily identified, using a previously developed deep-learning-based automatic species identification software, Nature Identification API (https://identify.biodiversityanalysis.nl). In the next step, single insects were linked between consecutive images using a tracking algorithm that uses screen position and the preliminary identifications. This step yields for every individual insect a linked series of outlines and preliminary identifications. The preliminary identifications for individual insects can differ between multiple captured images and were therefore combined into one identification using a fusing algorithm. The result of the algorithm is a series of tracks of individual insects with species identifications, which can be subsequently translated into an estimate of the counts of insects per species or species complexes. Here we show the first set of results acquired during the spring and summer of 2019. We will discuss practical experiences with setting up cameras in the field, including the effectiveness of the different set-ups. We will also show the effectiveness of using automatic species identification in the type of images that were acquired (see attached figure) and discuss to what extent individual species can be identified reliably. Finally, we will discuss the ecological information that can be extracted from the smart insect cameras

    Machine Learning Model for Identifying Dutch/Belgian Biodiversity

    No full text
    The potential of citizen scientists to contribute to information about occurrences of species and other biodiversity questions is large because of the ubiquitous presence of organisms and friendly nature of the subject. Online platforms that collect observations of species from the public have existed for several years now. They have seen a rapid growth recently, partly due to the widespread availability of mobile phones. These online platforms, and many scientific studies as well, suffer from a taxonomic bias: the effect that certain species groups are overrepresented in the data (Troudet et al. 2017). One of the reasons for this bias is that the accurate identification of species, by non-experts and experts, has been limited by the large number of species that exist. Even in the geographically limited area of the Netherlands and Belgium, the number of species that are regularly observed are in the thousands. This makes the ability to identify all those species difficult or impossible for an individual. Recent advances in species identification powered by deep learning, based on images (Norouzzadeh et al. 2018), suggest a large potential for a new set of digital tools that can help the public (and experts) to identify species automatically. The online observation platform Observation.org has collected over 93 million occurrences in the Netherlands and Belgium in the last 15 years. About 20% of these occurrences are supported by photographs, giving a rich database of 17 million photographs covering all major species groups (e.g., birds, mammals, plants, insects, fungi). Most of the observations with photos were validated by human experts at Observation.org, creating a unique database suitable for machine learning. We have developed a deep learning-based species identification model using this database containing 13,767 species, 1,530 species-groups, 734 subspecies and 117 hybrids. The model is made available to the public through a web service (https://identify.biodiversityanalysis.nl) and through a set of mobile apps (ObsIdentify). In this talk we will discuss our technical approach for dealing with the large number of species in a deep learning model. We will evaluate the results in terms of performance for different species groups and what this could mean to address part of the taxonomic bias. We will also consider limitations of (image-based) automated species identification and determine venues to further improve identification. We will illustrate how the web service and mobile apps are applied to support citizen scientists and the observation validation workflows at Observation.org. Finally, we will examine the potential of these methods to provide large scale automated analysis of biodiversity data
    corecore