3,836 research outputs found

    Applications of Natural Language Processing in Biodiversity Science

    Get PDF
    Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science

    Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds

    Get PDF
    Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences

    Machine learning for ecosystem services

    Get PDF
    Recent developments in machine learning have expanded data-driven modelling (DDM) capabilities, allowing artificial intelligence to infer the behaviour of a system by computing and exploiting correlations between observed variables within it. Machine learning algorithms may enable the use of increasingly available ‘big data’ and assist applying ecosystem service models across scales, analysing and predicting the flows of these services to disaggregated beneficiaries. We use the Weka and ARIES software to produce two examples of DDM: firewood use in South Africa and biodiversity value in Sicily, respectively. Our South African example demonstrates that DDM (64–91% accuracy) can identify the areas where firewood use is within the top quartile with comparable accuracy as conventional modelling techniques (54–77% accuracy). The Sicilian example highlights how DDM can be made more accessible to decision makers, who show both capacity and willingness to engage with uncertainty information. Uncertainty estimates, produced as part of the DDM process, allow decision makers to determine what level of uncertainty is acceptable to them and to use their own expertise for potentially contentious decisions. We conclude that DDM has a clear role to play when modelling ecosystem services, helping produce interdisciplinary models and holistic solutions to complex socio-ecological issues

    Proteomic fingerprinting facilitates biodiversity assessments in understudied ecosystems: A case study on integrated taxonomy of deep sea copepods

    Get PDF
    Accurate and reliable biodiversity estimates of marine zooplankton are a prerequisite to understand how changes in diversity can affect whole ecosystems. Species identification in the deep sea is significantly impeded by high numbers of new species and decreasing numbers of taxonomic experts, hampering any assessment of biodiversity. We used in parallel morphological, genetic, and proteomic characteristics of specimens of calanoid copepods from the abyssal South Atlantic to test if proteomic fingerprinting can accelerate estimating biodiversity. We cross-validated the respective molecular discrimination methods with morphological identifications to establish COI and proteomic reference libraries, as they are a pre-requisite to assign taxonomic information to the identified molecular species clusters. Due to the high number of new species only 37% of the individuals could be assigned to species or genus level morphologically. COI sequencing was successful for 70% of the specimens analysed, while proteomic fingerprinting was successful for all specimens examined. Predicted species richness based on morphological and molecular methods was 42 morphospecies, 56 molecular operational taxonomic units (MOTUs) and 79 proteomic operational taxonomic units (POTUs), respectively. Species diversity was predicted based on proteomic profiles using hierarchical cluster analysis followed by application of the variance ratio criterion for identification of species clusters. It was comparable to species diversity calculated based on COI sequence distances. Less than 7% of specimens were misidentified by proteomic profiles when compared with COI derived MOTUs, indicating that unsupervised machine learning using solely proteomic data could be used for quickly assessing species diversity

    Identifiers in e-Science platforms for the ecological sciences

    Get PDF
    In the emerging Web of Data, publishing stable and unique identifiers promises great potential in using the web as common platform to discover and enrich data in the ecologic sciences. With our collaborative e-Science platform “BEFdata”, we generated and published unique identifiers for the data repository of the Biodiversity – Ecosystem Functioning Research Unit of the German Research Foundation (BEF-China; DFG: FOR 891). We linked part of the identifiers to two external data providers, thus creating a virtual common platform including several ecological repositories. We used the Global Biodiversity Facility (GBIF) as well the International Plant Name Index (IPNI) to enrich the data from our own field observations. We conclude in discussing other potential providers for identifiers for the ecological research domain. We demonstrate the ease of making use of existing decentralized and unsupervised identifiers for a data repository, which opens new avenues to collaborative data discovery for learning, teaching, and research in ecology

    High-Resolution Vertical Habitat Mapping of a Deep-Sea Cliff offshore Greenland

    Get PDF
    Recent advances in deep-sea exploration with underwater vehicles have led to the discovery of vertical environments inhabited by a diverse sessile fauna. However, despite their ecological importance, vertical habitats remain poorly characterized by conventional downward-looking survey techniques. Here we present a high-resolution 3-dimensional habitat map of a vertical cliff hosting a suspension-feeding community at the flank of an underwater glacial trough in the Greenland waters of the Labrador Sea. Using a forward-looking set-up on a Remotely Operated Vehicle (ROV), a high-resolution multibeam echosounder was used to map out the topography of the deep-sea terrain, including, for the first time, the backscatter intensity. Navigational accuracy was improved through a combination of the USBL and the DVL navigation of the ROV. Multi-scale terrain descriptors were derived and assigned to the 3D point cloud of the terrain. Following an unsupervised habitat mapping approach, the application of a K-means clustering revealed four potential habitat types, driven by geomorphology, backscatter and fine-scale features. Using groundtruthing seabed images, the ecological significance of the four habitat clusters was assessed in order to evaluate the benefit of unsupervised habitat mapping for further fine-scale ecological studies of vertical environments. This study demonstrates the importance of a priori knowledge of the terrain around habitats that are rarely explored for ecological investigations. It also emphasizes the importance of remote characterization of habitat distribution for assessing the representativeness of benthic faunal studies often constrained by time-limited sampling activities. This case study further identifies current limitations (e.g., navigation accuracy, irregular terrain acquisition difficulties) that can potentially limit the use of deep-sea terrain models for fine-scale investigations
    corecore