5 research outputs found

    A model of provenance applied to biodiversity datasets

    Get PDF
    Nowadays, the Web has become one of the main sources of biodiversity information. An increasing number of biodiversity research institutions add new specimens and their related information to their biological collections and make this information available on the Web. However, mechanisms which are currently available provide insufficient provenance of biodiversity information. In this paper, we propose a new biodiversity provenance model extending the W3C PROV Data Model. Biodiversity data is mapped to terms from relevant ontologies, such as Dublin Core and GeoSPARQL, stored in triple stores and queried using SPARQL endpoints. Additionally, we provide a use case using our provenance model to enrich collection data

    Improving biodiversity data retrieval through semantic search and ontologies

    Get PDF
    Due to the increased amount of available biodiversity data, many biodiversity research institutions are now making their databases openly available on the web. Researchers in the field use this databases to extract new knowledge and also share their own discoveries. However, when these researchers need to find relevant information in the data, they still rely on the traditional search approach, based on text matching, that is not appropriate to be used in these large amounts of heterogeneous biodiversity's data, leading to search results with low precision and recall. We present a new architecture that tackle this problem using a semantic search system for biodiversity data. Semantic search aims to improve search accuracy by using ontologies to understand user objectives and the contextual meaning of terms used in the search to generate more relevant results. Biodiversity data is mapped to terms from relevant ontologies, such as Darwin Core, DBpedia, Ontobio and Catalogue of Life, stored using semantic web formats and queried using semantic web tools (such as triple stores). A prototype semantic search tool was successfully implemented and evaluated by users from the National Research Institute for the Amazon (INPA). Our results show that the semantic search approach has a better precision (28[%] improvement) and recall (25[%] improvement) when compared to keyword based search, when used in a big set of representative biodiversity data (206,000 records) from INPA and the Emilio Gueldi Museum in Pará (MPEG). We also show that, because the biodiversity data is now in semantic web format and mapped to ontology terms, it is easy to enhance it with information from other sources, an example using deforestation data (from the National Institute of Space Research - INPE) to enrich collection data is shown. © 2014 IEEE

    Semantic search architecture for retrieving information in biodiversity repositories

    No full text
    The amount of biological data available electronically is increasing at a rapid rate; for instance, over 16.500 specimens are available today in the National Institute for Amazonian Research (INPA) collections. However, this data is not semantically categorized and stored and thus is difficult to search. To tackle this problem, we present a semantic search architecture, implemented using state of the art semantic web tools, and test it on a set of representative data about biodiversity from INPA. This paper describes how the mechanism of mapping is designed so that the semantic search can find information, based on ontologies. We show a series of SPARQL queries and explain how the mapping mechanism works. Our experiments, using a prototype of the proposed architecture, showed that the prototype had better precision and recall then traditional keyword based search engines

    SWI: A Semantic Web Interactive Gazetteer to support Linked Open Data

    No full text
    Current implementations of gazetteers, geographic directories that associate place names to geographic coordinates, cannot use semantics to answer complex queries (most gazetteers are just thesauri of place names), use domain ontologies for place name disambiguation, make their data sets available in the Semantic Web or support the use of Volunteered Geographic Information (VGI). A new generation of gazetteers has to tackle these problems. In this paper, we present a new architecture for gazetteers that uses VGI and Semantic Web tools, such as ontologies and Linked Open Data to overcome these limitations. We also present a gazetteer, the Semantic Web Interactive Gazetteer (SWI), implemented using this architecture, and show that it can be used to add absent geographic coordinates to biodiversity records. In our tests, we use this gazetteer to correct geographic data from a big sample (around 142,000 occurrence records of Amazonian specimens) from SpeciesLink, a big repository of biodiversity collection records from Brazil. The tests showed that the SWI Gazetteer was able to add geographic coordinates to around 30,000 records, increasing the records with coordinates from 30.29% to 57.5% of the total number of records in the sample (representing an increase of 90%). © 2015 Elsevier B.V. All rights reserved
    corecore