18 research outputs found
enabling a research data management beyond data heterogeneity
A primary goal of a research infrastructure for data management should be to
enable efficient data discovery and integration of heterogeneous data. The
German Federation for Biological Data (GFBio) was envisioned by this goal. The
basic component, that enables such interoperability and serves as a backbone
for such a platform, is the GFBio Terminology Service (GFBio TS). It acts as a
semantic platform for accessing, developing and reasoning over terminological
resources within the biological and environmental domain. A RESTful API gives
access to these terminological resources in a uniform way regardless of their
degree of complexity and whether they are internally stored or externally
accessed through their web services. Additionally, a set of widgets with an
intrinsic API connection are made available for an easy integration in
applications and web interfaces. Based on the requirements of the GFBio
partners, we describe the added value that is provided by the GFBio
Terminology Service with practical scenarios but also, what challenges we
still face. We conclude by describing our current activities and future
developments
Aligning Observable Property Terminologies using the I-ADOPT framework
During its lifetime, the RDA WG InteroperAble Descriptions of Observable Property Terminologies (I-ADOPT) developed a semantic framework to represent scientific variables and give a detailed account of what has been measured or otherwise been observed. The framework breaks down complex variable descriptions into essential atomic components, e.g., what quality or quantity kind of which object or phenomenon kind are the subject of the measurement or observation. While the ecological domain served as a starting point, we took into account other domains as part of the development. As a result, the I-ADOPT model is a generic framework to describe observational properties. The recommendations of the IADOPT WG have been published along with several outputs including the I-ADOPT ontology itself and a collection of terminologies to be used as atomic components.
The I-ADOPT WG is now in maintenance mode but work is continuing on testing and supporting real-life implementation scenarios. Multiple terminology providers and data repositories have started aligning their variables to the I-ADOPT framework. This results in an increasing number of I-ADOPT-compliant variable descriptions from different stakeholders. We are now ready to test whether one of the main goals of the I-ADOPT WG has actually been achieved: does the I-ADOPT framework enable semantic interoperability of variable descriptions across datasets annotated using different controlled vocabularies?
This poster highlights how the I-ADOPT model has been applied to existing terminologies of observational variables, providing detailed semantic context information. We present current efforts to exploit these details while aligning terminologies of different origins. We want to encourage other terminology providers and domains to explore the I-ADOPT framework to grow an increasing network of interoperable terminologies for observational variables
The EDIT Platform for Cybertaxonomy - an integrated software environment for biodiversity research data management
The Platform for Cybertaxonomy [1], developed as part of the EU Network of Excellence EDIT (European Distributed Institute of Taxonomy), is an open-source software framework covering the full breadth of the taxonomic workflow, from fieldwork to publication [2]. It provides a number of tools for full, customized access to taxonomic data, editing and management, and collaborative team work.
At the core of the platform is the Common Data Model [3], offering a comprehensive information model covering all relevant data domains: names and classifications, descriptive data (morphological and molecular), media, geographic information, literature, specimens, persons, and external resources [4]. The model adheres to community standards developed by the Biodiversity Information Standards organization TDWG [5]. Apart from its role as a software suite supporting the taxonomic workflow, the platform is a powerful information broker for a broad range of taxonomic data providing solid and open interfaces including a Java programmer’s library and a CDM Rest Service Layer.
In the context of the DFG-funded "Additivity" project ("Achieving additivity of structured taxonomic character data by persistently linking them to preserved individual specimens", DFG project number 310530378), we are developing components for capturing and processing formal descriptions of specimens as well as algorithms for aggregating data from individual specimens in order to compute species-level descriptions [6]. Well-defined and agreed descriptive vocabularies referring to structures, characters and character states are instrumental in ensuring the consistency and comparability of measurements. This will be addressed with a new EDIT Platform module for specifying vocabularies based on existing ontologies for descriptive data. To ensure that these vocabularies can be re-used in different contexts, we are planning an interface to the Terminology Service developed by the German Federation for Biological Data (GFBio) [7]. The Terminology Service provides a semantic standards aware and harmonised access point for distributed or locally stored ontologies required for biodiversity research data management, archiving and publication processes [8].
The interface will work with a new OWL export function of the CDM library, which provides EDIT Platform vocabularies in a format that can be read by the import module of the Terminology Service. In addition, the EDIT Platform will be equipped with the ability to import semantic concepts from the Terminology Service using its API and keeping a persistent link to the original concept. With an active pipeline between the EDIT Platform and the GFBio Terminology Service, terminologies originating from the taxonomic research process can be re-used in different research contexts as well as for the semantic annotation and integration of existing research data processed by the GFBio archiving and data publication infrastructure.
KEYWORDS: taxonomic computing, descriptive data, terminology, inference
REFERENCES:
1. EDIT Platform for Cybertaxonomy. http://www.cybertaxonomy.org (accessed 17 May 2018).
2. Ciardelli, P., Kelbert, P., Kohlbecker, A., Hoffmann, N., Güntsch, A. & Berendsohn, W. G., 2009. The EDIT Platform for Cybertaxonomy and the Taxonomic Workflow: Selected Components, in: Fischer, S., Maehle, E., Reischuk, R. (Eds.): INFORMATIK 2009 – Im Focus das Leben. GI-Edition: Lecture Notes in Informatics (LNI) – Proceedings 154. Köllen Verlag, Bonn, pp. 28;625-638.
3. Müller, A., Berendsohn, W. G., Kohlbecker, A., Güntsch, A., Plitzner, P. & Luther, K., 2017. A Comprehensive and Standards-Aware Common Data Model (CDM) for Taxonomic Research. Proceedings of TDWG 1: e20367. https://doi.org/10.3897/tdwgproceedings.1.20367.
4. EDIT Common Data Model. https://dev.e-taxonomy.eu/redmine/projects/edit/wiki/CommonDataModel (accessed 17 May 2018).
5. Biodiversity Information Standards TDWG. http://www.tdwg.org/ (accessed 17 May 2018).
6. Henning T., Plitzner P., Güntsch A., Berendsohn W. G., Müller A. & Kilian N., 2018. Building compatible and dynamic character matrices – Current and future use of specimen-based character data. Bot. Lett. https://doi.org/10.1080/23818107.2018.1452791.
7. Diepenbroek, M., Glöckner, F., Grobe, P., Güntsch, A., Huber, R., König-Ries, B., Kostadinov, I., Nieschulze, J., Seeger, B.; Tolksdorf, R. & Triebel, D., 2014. Towards an Integrated Biodiversity and Ecological Research Data Management and Archiving Platform: The German Federation for the Curation of Biological Data (GFBio), in: Plödereder, E., Grunske, L., Schneider, E., Ull, D. (Eds.): Informatik 2014 – Big Data Komplexität meistern. GI-Edition: Lecture Notes in Informatics (LNI) – Proceedings 232. Köllen Verlag, Bonn, pp. 1711-1724.
8. Karam, N., Müller-Birn, C., Gleisberg, M., Fichtmüller, D., Tolksdorf, R., & Güntsch, A., 2016. A Terminology Service Supporting Semantic Annotation, Integration, Discovery and Analysis of Interdisciplinary Research Data. Datenbank-Spektrum, 16(3), 195–205. https://doi.org/10.1007/s13222-016-0231-8
Recommended from our members
Results of the ontology alignment evaluation initiative 2020
The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2020 campaign offered 12 tracks with 36 test cases, and was attended by 19 participants. This paper is an overall presentation of that campaign
Results of the Ontology Alignment Evaluation Initiative 2021
The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2021 campaign offered 13 tracks and was attended by 21 participants. This paper is an overall presentation of that campaig
Recherche sémantique d'informations (une approche basée sur les logiques de description)
Cette thèse propose une solution de recherche sémantique d'informations basée sur les logiques de description. Les besoins exprimés par une description en langage naturel sont transformés en une terminologie, cette dernière est ensuite comparée à celles décrivant les documents présents dans la base considérée. Ce traitement est appelé matching de terminologies. Il consiste à comparer les éléments de deux terminologies afin de déduire les correspondances sémantiques entre eux. Le principe est basé sur le calcul de la discordance entre les éléments de la requête et ceux des documents. Notre cadre applicatif impose la prise en compte de liens sémantiques issus d'une ontologie lexicale lors de la comparaison des éléments. Aussi, les différentes étapes du traitement doivent tenir compte de cette information ontologique. Ce besoin nous a conduit à établir deux approches nécessitant des extensions des algorithmes des subsomption et de différence sur lesquels s'appuie le matchingCLERMONT FD-BCIU Sci.et Tech. (630142101) / SudocSudocFranceF
The Virtual Tele-TASK Professor—Semantic Search in Recorded Lectures ABSTRACT
This paper describes our e-librarian service that understands students ’ complete questions in natural language and retrieves very few but pertinent learning objects, i.e., short multimedia documents. The system is based on three key components: the formal representation of a domain ontology, a mechanism to automatically identify learning objects out of a knowledge source, and a semantic search engine that yields only pertinent results based on the freely formulated questions in natural language. We report on experiments about students ’ acceptance to enter complete questions instead of only keywords, and about the benefits of such a virtual personal teacher in an educational environment
Be a Collaborator and a Competitor in Crowdsourcing System
International audienceCrowdsourcing is emerging as a powerful paradigm to solve a wide range of tedious and complex problems in various enterprise applications. It spawns the issue of finding the unknown collaborative and competitive group of solvers. The formation of collaborative team should provide the best solution and treat that solution as a trade secret avoiding data leak between competitive teams due to reward behind the outsourcing of the issue. In this paper, we propose a data leak aware crowdsourcing system called SocialCrowd. We introduce a clustering algorithm that uses social relationships between crowd workers to discover all possible teams while avoiding inter-team data leakage
Comparing natural language documents: a DL based approach
We propose a method to compare semantically two natural language texts. The process is realized in two steps, the first translates the texts into description logics terminologies. The second computes the difference between the terminologies obtained. We show how the best covering problem can be used to compute the difference between two terminologies and propose a method to calculate this difference. 1 Introduction an