27 research outputs found

    Whombat: An open-source annotation tool for machine learning development in bioacoustics

    Full text link
    1. Automated analysis of bioacoustic recordings using machine learning (ML) methods has the potential to greatly scale biodiversity monitoring efforts. The use of ML for high-stakes applications, such as conservation research, demands a data-centric approach with a focus on utilizing carefully annotated and curated evaluation and training data that is relevant and representative. Creating annotated datasets of sound recordings presents a number of challenges, such as managing large collections of recordings with associated metadata, developing flexible annotation tools that can accommodate the diverse range of vocalization profiles of different organisms, and addressing the scarcity of expert annotators. 2. We present Whombat a user-friendly, browser-based interface for managing audio recordings and annotation projects, with several visualization, exploration, and annotation tools. It enables users to quickly annotate, review, and share annotations, as well as visualize and evaluate a set of machine learning predictions on a dataset. The tool facilitates an iterative workflow where user annotations and machine learning predictions feedback to enhance model performance and annotation quality. 3. We demonstrate the flexibility of Whombat by showcasing two distinct use cases: an project aimed at enhancing automated UK bat call identification at the Bat Conservation Trust (BCT), and a collaborative effort among the USDA Forest Service and Oregon State University researchers exploring bioacoustic applications and extending automated avian classification models in the Pacific Northwest, USA. 4. Whombat is a flexible tool that can effectively address the challenges of annotation for bioacoustic research. It can be used for individual and collaborative work, hosted on a shared server or accessed remotely, or run on a personal computer without the need for coding skills.Comment: 17 pages, 2 figures, 2 tables, to be submitted to Methods in Ecology and Evolutio

    Accounting for spatial autocorrelation and environment are important to derive robust bat population trends from citizen science data

    Get PDF
    Monitoring wildlife populations is essential if global targets to reverse biodiversity declines are to be met. Recent analysis of data from the UK’s long-term National Bat Monitoring Programme (NBMP) suggests stable or increasing population trends for many bat species, and these statistics help inform progress towards national biodiversity targets. However, although based on robust citizen science survey designs, it is unknown how sensitive these trends are to spatial and environmental biases. Here we use Bayesian hierarchical modelling with integrated nested Laplace approximation (INLA), to examine the impact of these types of biases on the population trends using relative occupancy of four species monitored by the NBMP Field Survey in Great Britain (GB): Pipistrellus pipistrellus, P. pygmaeus, Nyctalus noctula and Eptesicus serotinus. Where possible, we also disaggregated trends to national levels using the best model per species to determine if national differences in trends remain once sampling biases are accounted for. Although we found evidence of spatial clustering in the NBMP Field Survey locations, the previously reported GB-wide population trends are broadly robust to spatial autocorrelation. In most species, accounting for spatial autocorrelation and species-environment relationships improved model fit. The nationally disaggregated models highlighted that GB-wide trends mask differences between England and Scotland, consistent with previous analysis of these data, as well as illustrating large gaps in survey effort, especially in Wales. We suggest that although bat population trends were found to be broadly robust to sampling biases present in these data, small differences could propagate over time and this impact is likely to be more severe in less structured citizen science data. Therefore, ensuring trends are robust to sampling biases present in citizen science datasets is critical to effective monitoring of progress towards biodiversity targets, managing populations sustainably, and ultimately a reversal of global declines

    Coalescing disparate data sources for the geospatial prediction of mosquito abundance, using Brazil as a motivating case study

    Get PDF
    One of the barriers to performing geospatial surveillance of mosquito occupancy or infestation anywhere in the world is the paucity of primary entomologic survey data geolocated at a residential property level and matched to important risk factor information (e.g., anthropogenic, environmental, and climate) that enables the spatial risk prediction of mosquito occupancy or infestation. Such data are invaluable pieces of information for academics, policy makers, and public health program managers operating in low-resource settings in Africa, Latin America, and Southeast Asia, where mosquitoes are typically endemic. The reality is that such data remain elusive in these low-resource settings and, where available, high-quality data that include both individual and spatial characteristics to inform the geospatial description and risk patterning of infestation remain rare. There are many online sources of open-source spatial data that are reliable and can be used to address such data paucity in this context. Therefore, the aims of this article are threefold: (1) to highlight where these reliable open-source data can be acquired and how they can be used as risk factors for making spatial predictions for mosquito occupancy in general; (2) to use Brazil as a case study to demonstrate how these datasets can be combined to predict the presence of arboviruses through the use of ecological niche modeling using the maximum entropy algorithm; and (3) to discuss the benefits of using bespoke applications beyond these open-source online data sources, demonstrating for how they can be the new “gold-standard” approach for gathering primary entomologic survey data. The scope of this article was mainly limited to a Brazilian context because it builds on an existing partnership with academics and stakeholders from environmental surveillance agencies in the states of Pernambuco and Paraiba. The analysis presented in this article was also limited to a specific mosquito species, i.e., Aedes aegypti, due to its endemic status in Brazil

    Data integration for large-scale models of species distributions

    Get PDF
    With the expansion in the quantity and types of biodiversity data being collected, there is a need to find ways to combine these different sources to provide cohesive summaries of species’ potential and realized distributions in space and time. Recently, model-based data integration has emerged as a means to achieve this by combining datasets in ways that retain the strengths of each. We describe a flexible approach to data integration using point process models, which provide a convenient way to translate across ecological currencies. We highlight recent examples of large-scale ecological models based on data integration and outline the conceptual and technical challenges and opportunities that arise

    Shiga toxin-producing Escherichia coli clonal complex 32, including serotype O145:H28, in the UK and Ireland

    Get PDF
    Introduction. Shiga toxin-producing Escherichia coli (STEC) O157:H7 has been the most clinically significant STEC serotype in the UK for over four decades. Over the last 10 years we have observed a decrease in STEC O157:H7 and an increase in non-O157 STEC serotypes, such as O145:H28. Gap Statement. Little is known about the microbiology and epidemiology of STEC belonging to CC32 (including O145:H28) in the UK. The aim of this study was to integrate genomic data with patient information to gain a better understanding of the virulence, disease severity, epidemic risk assessment and population structure of this clinically significant clonal complex. Methodology. Isolates of E. coli belonging to CC32 (n=309) in the archives of public health agencies in the UK and Ireland were whole-genome-sequenced, virulence-profiled and integrated with enhanced surveillance questionnaire (ESQ) data, including exposures and disease severity. Results. Overall, diagnoses of STEC belonging to CC32 (290/309, 94 %) in the UK have increased every year since 2014. Most cases were female (61 %), and the highest proportion of cases belonged to the 0–4 age group (53/211,25 %). The frequency of symptoms of diarrhoea (92 %), abdominal pain (84 %), blood in stool (71 %) and nausea (51 %) was similar to that reported in cases of STEC O157:H7, although cases of STEC CC32 were more frequently admitted to hospital (STEC CC32 48 % vs O157:H7  34 %) and/or developed haemolytic uraemic syndrome (HUS) (STEC CC32 9 % vs O157:H7 4 %). The majority of STEC isolates (268/290, 92 %) had the stx2a/eae virulence gene combination, most commonly associated with progression to STEC HUS. There was evidence of person-to-person transmission and small, temporally related, geographically dispersed outbreaks, characteristic of foodborne outbreaks linked to nationally distributed products. Conclusion. We recommend more widespread use of polymerase chain reaction (PCR) for the detection of all STEC serogroups, the development of consistent strategies for the follow-up testing of PCR-positive faecal specimens, the implementation of more comprehensive and standardized collection of epidemiological data, and routine sharing of sequencing data between public health agencies worldwide

    An Evaluation of the OpenWeatherMap API versus INMET Using Weather Data from Two Brazilian Cities: Recife and Campina Grande

    Get PDF
    Certain weather conditions are inadvertently related to increased population of various mosquitoes. In order to predict the burden of mosquito populations in the Global South, it is imperative to integrate weather-related risk factors into such predictive models. There are a lot of online open-source weather platforms that provide historical, current and future weather forecasts which can be utilised for general predictions, and these electronic sources serve as an alternate option for weather data when physical weather stations are inaccessible (or inactive). Before using data from such online source, it is important to assess the accuracy against some baseline measure. In this paper, we therefore evaluated the accuracy and suitability of weather forecasts of two parameters namely temperature and humidity from the OpenWeatherMap API (an online weather platform) and compared them with actual measurements collected from the Brazilian weather stations (INMET). The evaluation was focused on two Brazilian cites, namely, Recife and Campina Grande. The intention is to prepare an early warning model which will harness data from OpenWeatherMap API for mosquito prediction

    Temporal and Spatiotemporal Arboviruses Forecasting by Machine Learning: A Systematic Review

    Get PDF
    Arboviruses are a group of diseases that are transmitted by an arthropod vector. Since they are part of the Neglected Tropical Diseases that pose several public health challenges for countries around the world. The arboviruses' dynamics are governed by a combination of climatic, environmental, and human mobility factors. Arboviruses prediction models can be a support tool for decision-making by public health agents. In this study, we propose a systematic literature review to identify arboviruses prediction models, as well as models for their transmitter vector dynamics. To carry out this review, we searched reputable scientific bases such as IEE Xplore, PubMed, Science Direct, Springer Link, and Scopus. We search for studies published between the years 2015 and 2020, using a search string. A total of 429 articles were returned, however, after filtering by exclusion and inclusion criteria, 139 were included. Through this systematic review, it was possible to identify the challenges present in the construction of arboviruses prediction models, as well as the existing gap in the construction of spatiotemporal models

    Epidemiology and genomic analysis of Shiga toxin-producing Escherichia coli clonal complex 165 in the UK

    Get PDF
    Introduction. Shiga toxin-producing Escherichia coli (STEC) is a zoonotic, foodborne gastrointestinal pathogen that has the potential to cause severe clinical outcomes, including haemolytic uraemic syndrome (HUS). STEC-HUS is the leading cause of renal failure in children and can be fatal. Over the last decade, STEC clonal complex 165 (CC165) has emerged as a cause of STEC-HUS. Gap Statement. There is a need to understand the pathogenicity and prevalence of this emerging STEC clonal complex in the UK, to facilitate early diagnosis, improve clinical management, and prevent and control outbreaks. Aim. The aim of this study was to characterize CC165 through identification of virulence factors (VFs) and antimicrobial resistance (AMR) determinants in the genome and to integrate the genome data with the available epidemiological data to better understand the incidence and pathogenicity of this clonal complex in the UK. Methodology. All isolates belonging to CC165 in the archives at the UK public health agencies were sequenced and serotyped, and the virulence gene and AMR profiles were derived from the genome using PHE bioinformatics pipelines and the Centre for Genomic Epidemiology virulence database. Results. There were 48 CC165 isolates, of which 43 were STEC, four were enteropathogenic E. coli (EPEC) and one E. coli. STEC serotypes were predominately O80:H2 (n=28), and other serotypes included O45:H2 (n=9), O55:H9 (n=4), O132:H2 (n=1) and O180:H2 (n=1). All but one STEC isolate had Shiga toxin (stx) subtype stx2a or stx2d and 47/48 isolates had the eae gene encoding intimin involved in the intimate attachment of the bacteria to the human gut mucosa. We detected extra-intestinal virulence genes including those associated with iron acquisition (iro) and serum resistance (iss), indicating that this pathogen has the potential to translocate to extra-intestinal sites. Unlike other STEC clonal complexes, a high proportion of isolates (93%, 40/43) were multidrug-resistant, including resistance to aminoglycosides, beta-lactams, chloramphenicol, sulphonamides, tetracyclines and trimethoprim. Conclusion. The clinical significance of this clonal complex should not be underestimated. Exhibiting high levels of AMR and a combination of STEC and extra-intestinal pathogenic E. coli (ExPEC) virulence profiles, this clonal complex is an emerging threat to public health

    Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips

    Get PDF
    The Brassica napus 60K Illumina Infinium™ SNP array has had huge international uptake in the rapeseed community due to the revolutionary speed of acquisition and ease of analysis of this high-throughput genotyping data, particularly when coupled with the newly available reference genome sequence. However, further utilization of this valuable resource can be optimized by better understanding the promises and pitfalls of SNP arrays. We outline how best to analyze Brassica SNP marker array data for diverse applications, including linkage and association mapping, genetic diversity and genomic introgression studies. We present data on which SNPs are locus-specific in winter, semi-winter and spring B. napus germplasm pools, rather than amplifying both an A-genome and a C-genome locus or multiple loci. Common issues that arise when analyzing array data will be discussed, particularly those unique to SNP markers and how to deal with these for practical applications in Brassica breeding applications
    corecore