8,796 research outputs found

    Structuring Information from Plant Morphological Descriptions using Open Information Extraction

    Get PDF
    Taxonomic literature keeps records of the planet's biodiversity and gives access to the knowledge needed for research and sustainable management. The number of publications generated is quite large: the corpus of biodiversity literature includes tens of millions of figures and taxonomic treatments. Unfortunately, most of the taxonomic descriptions are from scientific publications in text format. With more than 61 million digitized pages in the Biodiversity Heritage Library (BHL), only 467,265 taxonomic treatments are available in the Biodiversity Literature Repository. To obtain highly structured texts from digitized text has been shown to be complex and very expensive (Cui et al. 2021). The scientific community has described over 1.2 million species, but studies suggest that 86% of existing species on Earth and 91% of species in the ocean still await description (Mora et al. 2011). The published descriptions synthesize observations made by taxonomists over centuries of research and include detailed morphological aspects (i.e., shape and structure) of species useful to identify specimens, to improve information search mechanisms, to perform data analysis of species having particular characteristics, and to compare species descriptions.To take full advantage of this information and to work towards integrating it with repositories of biodiversity knowledge, the biodiversity informatics community first needs to convert plain text into a machine-processable format. More precisely, there is a need to identify structures and substructure names and the characters that describe them (Fig. 1).Open information extraction (OIE) is a research area of Natural Language Processing (NLP), which aims to automatically extract structured, machine-readable representations of data available in unstructured text; usually the result is handled as n-ary propositions, for instance, triples of the form (Shen et al. 2022).OIE is continuously evolving with advancements in NLP and machine learning techniques. The state of the art in OIE involves the use of neural approaches, pre-trained language models, and integration of dependency parsing and semantic role labeling. Neural solutions mainly formulate OIE as a sequence tagging problem or a sequence generation problem. Ongoing research focuses on improving extraction accuracy; handling complex linguistic phenomena, for instance, addressing challenges like coreference resolution; and more open information extraction, because most existing neural solutions work in English texts (Zhou et al. 2022).The main objective of this project is to evaluate and compare the results of automatic data extraction from plant morphological descriptions using pre-trained language models (PLM) and a language model trained on data from plant morphological descriptions written in Spanish.The research data for this study were sourced from the species records database of the National Biodiversity Institute of Costa Rica (INBio). Specifically, the project focused on selecting records of morphological descriptions of plant species written in Spanish.The system processes the morphological descriptions using a workflow that includes phases like data selection and pre-processing, feature extraction, test PLM, local language model training, and test and evaluate results. Fig. 2 shows the general workflow used in this research.Pre-processing and Annotation: Descriptions were standardized by removing special characters like double and single quotes, replacing abbreviations, tokenizing text, and other transformations.Some records of the dataset were annotated with the ground-truth structured information in the form of triples that were extracted from each paragraph. Additionally, structured data from the project carried out by Mora and Araya (Mora and Araya 2018) were included in the dataset.Feature extraction: The token vectorization was done using word embedding directly by the language models.Test PLM: The evaluation process of PLM models used the zero-shot approach and involved applying the models to the test dataset, extracting information, and comparing it to annotated ground truth. Local Language Model Training: The annotated data was split into 80% training data and 20% test data. Using the training data, a language model based on the Transformers architecture was trained.Evaluate results: Evaluation metrics such as precision, recall, and F1 (a meaure of the model's accuracy) were calculated comparing the extracted information and the ground truth. The results were analyzed to understand the models' performance, identify strengths and weaknesses, and gain insights into their ability to extract accurate and relevant information. Based on the analysis, the evaluation process iteratively improved models results.The main contributions of this project are:A Transformers-based language model to extract information from morphological descriptions of plants written in Spanish available on the project website.*1A corpus of morphological descriptions of plants, written in Spanish, labeled for information extraction, and made available on the project website.The results of the project, the first of its kind applied to morphological descriptions of plants written in Spanish, published on the project website

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

    Plant conservation in Mediterranean-type ecosystems

    Get PDF
    This study has been made possible by the long-time experiences of the many research projects awarded to the authors. Especially, the projects 'Assessment, Monitoring and Applied Scientific Research for Ecological Restoration of Gypsum Mining Concessions (Majadas Viejas and Marylen) and Spreading of Results (ECORESGYP) ' sponsoredby the company EXPLOTACIONES RiO DE AGUAS S.L. (TORRALBA GROUP) ; 'Provision of services, monitoring and evaluation of the environmental restoration of the mining concessions Los Yesares, Maria Morales and El Cigarron' sponsored by the company Saint Gobain Placo Iberica S.A.; and 'CEIJ-009 Integrated study of coastal sands vegetation (AREVEG II) ' sponsored by CEI.MAR. We are very grateful to the three reviewers for their comments and suggestions, which have been very helpful in improving the manuscript.The present paper is an overview of state of the art in plant conservation in Mediterranean-type Ecosystems (MTEs), highlighting current studies and neglected topics. A review of the literature dealing with this issue and a general analysis of the results was performed, delving into relevant plant conservation biology topics. The main topics considered were: 1) reproductive biology and genetic conservation, 2) threat factors and effects of global change, and 3) evaluation of conservation status and protected areas selection. This study illustrates differences in the number of documents published in northern countries of the Mediterranean Basin concerning southern and eastern countries and compared with other MTEs. It also highlights the paramount importance of public organizations as funding entities. Additionally, it points to a decrease in traditional subject categories related to plant conservation and increased multidisciplinary conservation research and novel methodologies (e.g., phylogenomics, SDM). To overcome existing biases among the different MTE regions, integrating actions at a transnational level would be necessary, with standard conservation policies and strategies. Moreover, research should be supported with more important participation and funding from private entities, with a clear focus on specific conservation proposals. In contrast, certain weaknesses were detected, some related to the limited information available about threatened plant species and the scarce use of the available data from genetic conservation research in management plans. Consequently, the authors consider that future conservation efforts should be addressed to improve the knowledge of threatened MTEs’ flora and implement a manual of good practices, which would make use of the available research information to put forward more direct proposals for management and conservation.company Saint Gobain Placo Iberica S.A.CEI.MARcompany EXPLOTACIONES RiO DE AGUAS S.L. (TORRALBA GROUP

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Traditional ecological knowledge in the Peruvian Andes : practice, synergies, and sustainability

    Get PDF
    This thesis presents a theoretical discussion on the role of Traditional Ecological Knowledge (TEK) in livelihood activities and resilience strategies of the Indigenous peoples of the Peruvian Andes and the possibility of creating synergies with Western science. Using two case studies, from the Potato Park in Pisaq and the Chalakuy Maize Park in Lares, Cusco Region, it reviews how this ancestral knowledge is converted into practice by its holders to cultivate and protect the potato and maize varieties of the Andean highlands. The Quechua values of community, reciprocity, complementarity and solidarity are also considered, as they play an important role in the governance structures and the redistributive mechanisms of the parks. The study then examines how the collaboration with civil society and science practitioners has sparked innovation, improved the resilience of these communities to climate change and established the parks as Biocultural Heritage Territories for the protection of the Andean biodiversity. The analysis of the case studies demonstrates that TEK is a living, highly adaptable and valid source of information and practices of ecosystem management and climate-change adaptation for its holders. It may, however, be unsuitable to solve global sustainability problems due to its local and context-specific nature. The thesis concludes that TEK can, however, offer much-needed reflections on how to reconsider the anthropocentric view of Western science and capitalism, and rediscover a long-lost connection with our roots and a renewed respect for the natural world.M-D

    Towards a transformative governance of the Amazon

    Get PDF
    The crises of the Anthropocene can neither be confronted incrementally nor through short-term, reductionist strategies. As the risk of severe, irreversible socioecological damage increases, transformative change towards achieving long-term sustainability becomes ever-pressing. Against this backdrop, we explore how transformative governance can help strengthen ecosystem resilience, empower vulnerable communities and ensure sustainable development in the Amazon. The article starts by briefly reviewing the concept of transformative governance, arguing that it provides an adequate framework for thinking about and responding to the challenges of the Anthropocene. It then looks at how extant governance practices are destroying and fragmenting the Amazon, eroding the resilience of regional ecosystems. It proceeds by investigating how the Andes-Amazon-Atlantic Corridor, a transnational project aligned with the normative commitments and operational principles of transformative governance, aimed at protecting, restoring and building socioecological connectivity in the region, can offer an alternative pathway for Amazonian development in the new geological epoch
    • …
    corecore