8,796 research outputs found
Structuring Information from Plant Morphological Descriptions using Open Information Extraction
Taxonomic literature keeps records of the planet's biodiversity and gives access to the knowledge needed for research and sustainable management. The number of publications generated is quite large: the corpus of biodiversity literature includes tens of millions of figures and taxonomic treatments. Unfortunately, most of the taxonomic descriptions are from scientific publications in text format. With more than 61 million digitized pages in the Biodiversity Heritage Library (BHL), only 467,265 taxonomic treatments are available in the Biodiversity Literature Repository. To obtain highly structured texts from digitized text has been shown to be complex and very expensive (Cui et al. 2021). The scientific community has described over 1.2 million species, but studies suggest that 86% of existing species on Earth and 91% of species in the ocean still await description (Mora et al. 2011). The published descriptions synthesize observations made by taxonomists over centuries of research and include detailed morphological aspects (i.e., shape and structure) of species useful to identify specimens, to improve information search mechanisms, to perform data analysis of species having particular characteristics, and to compare species descriptions.To take full advantage of this information and to work towards integrating it with repositories of biodiversity knowledge, the biodiversity informatics community first needs to convert plain text into a machine-processable format. More precisely, there is a need to identify structures and substructure names and the characters that describe them (Fig. 1).Open information extraction (OIE) is a research area of Natural Language Processing (NLP), which aims to automatically extract structured, machine-readable representations of data available in unstructured text; usually the result is handled as n-ary propositions, for instance, triples of the form (Shen et al. 2022).OIE is continuously evolving with advancements in NLP and machine learning techniques. The state of the art in OIE involves the use of neural approaches, pre-trained language models, and integration of dependency parsing and semantic role labeling. Neural solutions mainly formulate OIE as a sequence tagging problem or a sequence generation problem. Ongoing research focuses on improving extraction accuracy; handling complex linguistic phenomena, for instance, addressing challenges like coreference resolution; and more open information extraction, because most existing neural solutions work in English texts (Zhou et al. 2022).The main objective of this project is to evaluate and compare the results of automatic data extraction from plant morphological descriptions using pre-trained language models (PLM) and a language model trained on data from plant morphological descriptions written in Spanish.The research data for this study were sourced from the species records database of the National Biodiversity Institute of Costa Rica (INBio). Specifically, the project focused on selecting records of morphological descriptions of plant species written in Spanish.The system processes the morphological descriptions using a workflow that includes phases like data selection and pre-processing, feature extraction, test PLM, local language model training, and test and evaluate results. Fig. 2 shows the general workflow used in this research.Pre-processing and Annotation: Descriptions were standardized by removing special characters like double and single quotes, replacing abbreviations, tokenizing text, and other transformations.Some records of the dataset were annotated with the ground-truth structured information in the form of triples that were extracted from each paragraph. Additionally, structured data from the project carried out by Mora and Araya (Mora and Araya 2018) were included in the dataset.Feature extraction: The token vectorization was done using word embedding directly by the language models.Test PLM: The evaluation process of PLM models used the zero-shot approach and involved applying the models to the test dataset, extracting information, and comparing it to annotated ground truth. Local Language Model Training: The annotated data was split into 80% training data and 20% test data. Using the training data, a language model based on the Transformers architecture was trained.Evaluate results: Evaluation metrics such as precision, recall, and F1 (a meaure of the model's accuracy) were calculated comparing the extracted information and the ground truth. The results were analyzed to understand the models' performance, identify strengths and weaknesses, and gain insights into their ability to extract accurate and relevant information. Based on the analysis, the evaluation process iteratively improved models results.The main contributions of this project are:A Transformers-based language model to extract information from morphological descriptions of plants written in Spanish available on the project website.*1A corpus of morphological descriptions of plants, written in Spanish, labeled for information extraction, and made available on the project website.The results of the project, the first of its kind applied to morphological descriptions of plants written in Spanish, published on the project website
Simple identification tools in FishBase
Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further
development. It explores the possibility of a holistic and integrated computeraided strategy
Recommended from our members
The Biodiversity Heritage Library: sharing biodiversity with the world
Ten major natural history museum libraries, botanical libraries, and research institutions in the United Kingdom and the United States joined in 2005 to develop a strategy and operational plan to digitize the published literature of biodiversity held in their respective collections and to make that literature available for open access and responsible use as a part of a global ‘biodiversity commons.’ Headquartered at the Smithsonian Institution Libraries, the Biodiversity Heritage Library (BHL) is one of the cornerstones of the Encyclopedia of Life, a global effort to document all 1.8 million named species of animals, plants, and other forms of life on earth. This paper provides an overview of the BHL and its potential impact on biodiversity research, describes the BHL portal and its innovative search services, and provides a case study of the process from one of the members: the Museum of Comparative Zoology at Harvard University.Libraries/Museum
Plant conservation in Mediterranean-type ecosystems
This study has been made possible by the long-time experiences of the many research projects awarded to the authors. Especially, the projects 'Assessment, Monitoring and Applied Scientific Research for Ecological Restoration of Gypsum Mining Concessions (Majadas Viejas and Marylen) and Spreading of Results (ECORESGYP) ' sponsoredby the company EXPLOTACIONES RiO DE AGUAS S.L. (TORRALBA GROUP) ; 'Provision of services, monitoring and evaluation of the environmental restoration of the mining concessions Los Yesares, Maria Morales and El Cigarron' sponsored by the company Saint Gobain Placo Iberica S.A.; and 'CEIJ-009 Integrated study of coastal sands vegetation (AREVEG II) ' sponsored by CEI.MAR. We are very grateful to the three reviewers for their comments and suggestions, which have been very helpful in improving the manuscript.The present paper is an overview of state of the art in plant conservation in Mediterranean-type Ecosystems
(MTEs), highlighting current studies and neglected topics. A review of the literature dealing with this issue and a general
analysis of the results was performed, delving into relevant plant conservation biology topics. The main topics considered
were: 1) reproductive biology and genetic conservation, 2) threat factors and effects of global change, and 3) evaluation of
conservation status and protected areas selection. This study illustrates differences in the number of documents published in
northern countries of the Mediterranean Basin concerning southern and eastern countries and compared with other MTEs. It
also highlights the paramount importance of public organizations as funding entities. Additionally, it points to a decrease in
traditional subject categories related to plant conservation and increased multidisciplinary conservation research and novel
methodologies (e.g., phylogenomics, SDM). To overcome existing biases among the different MTE regions, integrating actions
at a transnational level would be necessary, with standard conservation policies and strategies. Moreover, research should be
supported with more important participation and funding from private entities, with a clear focus on specific conservation
proposals. In contrast, certain weaknesses were detected, some related to the limited information available about threatened
plant species and the scarce use of the available data from genetic conservation research in management plans. Consequently,
the authors consider that future conservation efforts should be addressed to improve the knowledge of threatened MTEs’ flora
and implement a manual of good practices, which would make use of the available research information to put forward more
direct proposals for management and conservation.company Saint Gobain Placo Iberica S.A.CEI.MARcompany EXPLOTACIONES RiO DE AGUAS S.L. (TORRALBA GROUP
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Traditional ecological knowledge in the Peruvian Andes : practice, synergies, and sustainability
This thesis presents a theoretical discussion on the role of Traditional Ecological Knowledge (TEK) in livelihood activities and resilience strategies of the Indigenous peoples of the Peruvian Andes and the possibility of creating synergies with Western science. Using two case studies, from the Potato Park in Pisaq and the Chalakuy Maize Park in Lares, Cusco Region, it reviews how this ancestral knowledge is converted into practice by its holders to cultivate and protect the potato and maize varieties of the Andean highlands. The Quechua values of community, reciprocity, complementarity and solidarity are also considered, as they play an important role in the governance structures and the redistributive mechanisms of the parks. The study then examines how the collaboration with civil society and science practitioners has sparked innovation, improved the resilience of these communities to climate change and established the parks as Biocultural Heritage Territories for the protection of the Andean biodiversity. The analysis of the case studies demonstrates that TEK is a living, highly adaptable and valid source of information and practices of ecosystem management and climate-change adaptation for its holders. It may, however, be unsuitable to solve global sustainability problems due to its local and context-specific nature. The thesis concludes that TEK can, however, offer much-needed reflections on how to reconsider the anthropocentric view of Western science and capitalism, and rediscover a long-lost connection with our roots and a renewed respect for the natural world.M-D
Towards a transformative governance of the Amazon
The crises of the Anthropocene can neither be confronted incrementally nor through short-term, reductionist strategies. As the risk of severe, irreversible socioecological damage increases, transformative change towards achieving long-term sustainability becomes ever-pressing. Against this backdrop, we explore how transformative governance can help strengthen ecosystem resilience, empower vulnerable communities and ensure sustainable development in the Amazon. The article starts by briefly reviewing the concept of transformative governance, arguing that it provides an adequate framework for thinking about and responding to the challenges of the Anthropocene. It then looks at how extant governance practices are destroying and fragmenting the Amazon, eroding the resilience of regional ecosystems. It proceeds by investigating how the Andes-Amazon-Atlantic Corridor, a transnational project aligned with the normative commitments and operational principles of transformative governance, aimed at protecting, restoring and building socioecological connectivity in the region, can offer an alternative pathway for Amazonian development in the new geological epoch
- …