77 research outputs found

    Bridging End Users' Terms and AGROVOC Concept Server Vocabularies

    Get PDF
    The AGROVOC is a multilingual structured thesaurus in the agricultural domain. It has already been mapped with several vocabularies, for example AGROVOC-CAT, AGROVOC-NALT , AGROVOC-SWD. Although these vocabularies already contained a good portion of non-preferred terms, the terms are collected under the literary warrant and institutional warrant principles; which means vocabularies were collected based on the documents and publications rather than user‟s queries. It is still very common that end users would use different terms to express the same concept. In light of above discussion, we need to bridge these vocabularies and the users‟ terms Backgroun

    Bridging End Users' Terms and AGROVOC Concept Server Vocabularies

    Get PDF
    The AGROVOC is a multilingual structured thesaurus in the agricultural domain. It has already been mapped with several vocabularies, for example AGROVOC-CAT, AGROVOC-NALT , AGROVOC-SWD. Although these vocabularies already contained a good portion of non-preferred terms, the terms are collected under the literary warrant and institutional warrant principles; which means vocabularies were collected based on the documents and publications rather than user‟s queries. It is still very common that end users would use different terms to express the same concept. In light of above discussion, we need to bridge these vocabularies and the users‟ terms Backgroun

    Aligning Controlled vocabularies for enabling semantic matching in a distributed knowledge management system

    Get PDF
    The underlying idea of the Semantic Web is that web content should be expressed not only in natural language but also in a language that can be unambiguously understood, interpreted and used by software agents, thus permitting them to find, share and integrate information more easily. The central notion of the Semantic Web's syntax are ontologies, shared vocabularies providing taxonomies of concepts, objects and relationships between them, which describe particular domains of knowledge. A vocabulary stores words, synonyms, word sense definitions (i.e. glosses), relations between word senses and concepts; such a vocabulary is generally referred to as the Controlled Vocabulary (CV) if choice or selection of terms are done by domain specialists. A facet is a distinct and dimensional feature of a concept or a term that allows a taxonomy, ontology or CV to be viewed or ordered in multiple ways, rather than in a single way. The facet is clearly defined, mutually exclusive, and composed of collectively exhaustive properties or characteristics of a domain. For example, a collection of rice might be represented using a name facet, place facet etc. This thesis presents a methodology for producing mappings between Controlled Vocabularies, based on a technique called \Hidden Semantic Matching". The \Hidden" word stands for it not relying on any sort of externally provided background knowledge. The sole exploited knowledge comes from the \semantic context" of the same CVs which are being matched. We build a facet for each concept of these CVs, considering more general concepts (broader terms), less general concepts (narrow terms) or related concepts (related terms).Together these form a concept facet (CF) which is then used to boost the matching process

    The Simple Knowledge Organization System (SKOS): a situation report for the HIVE Project

    Get PDF
    HIVE (Helping Interdisciplinary Vocabularies Engineering) es un proyecto financiado por el IMLS (Institute of Museums and Library Services), e indirectamente, en Dryad, ambos proyectos en colaboración del Metadata Research Center y el National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina. Con el desarrollo de HIVE se pretende resolver esta problemática mediante una propuesta de generación automática de metadatos que permita la integración dinámica de vocabularios controlados específicos. Para asistir la integración de vocabularios se seleccionó SKOS (Simple Knowledge Organisation System), un estándar del World Wide Web Consortium (W3C) para la representación de sistemas de organización del conocimiento o vocabularios, como tesauros, esquemas de clasificación, sistemas de encabezamiento de materias y taxonomías, en el marco de la Web Semántica.El presente informe realiza un análisis exhaustivo de la situación en cuanto a la aplicación de SKOS. El estudio incluye una detallada revisión de literatura científica y recursos web sobre el modelo, una selección de los proyectos, iniciativas, herramientas, grupos de investigación claves y cualquier otro tipo de información que pudiera ser de relevancia para el logro de los objetivos del proyecto HIVE. Asimismo, se analiza la importancia de SKOS para el logro de la interoperabilidad semántica y se elaboran un conjunto de recomendaciones para los miembros del proyecto HIVE

    AgroPortal: a vocabulary and ontology repository for agronomy

    Get PDF
    Many vocabularies and ontologies are produced to represent and annotate agronomic data. However, those ontologies are spread out, in different formats, of different size, with different structures and from overlapping domains. Therefore, there is need for a common platform to receive and host them, align them, and enabling their use in agro-informatics applications. By reusing the National Center for Biomedical Ontologies (NCBO) BioPortal technology, we have designed AgroPortal, an ontology repository for the agronomy domain. The AgroPortal project re-uses the biomedical domain’s semantic tools and insights to serve agronomy, but also food, plant, and biodiversity sciences. We offer a portal that features ontology hosting, search, versioning, visualization, comment, and recommendation; enables semantic annotation; stores and exploits ontology alignments; and enables interoperation with the semantic web. The AgroPortal specifically satisfies requirements of the agronomy community in terms of ontology formats (e.g., SKOS vocabularies and trait dictionaries) and supported features (offering detailed metadata and advanced annotation capabilities). In this paper, we present our platform’s content and features, including the additions to the original technology, as well as preliminary outputs of five driving agronomic use cases that participated in the design and orientation of the project to anchor it in the community. By building on the experience and existing technology acquired from the biomedical domain, we can present in AgroPortal a robust and feature-rich repository of great value for the agronomic domain. Keyword

    Content Enrichment of Digital Libraries: Methods, Technologies and Implementations

    Get PDF
    Parallel to the establishment of the concept of a "digital library", there have been rapid developments in the fields of semantic technologies, information retrieval and artificial intelligence. The idea is to use make use of these three fields to crosslink bibliographic data, i.e., library content, and to enrich it "intelligently" with additional, especially non-library, information. By linking the contents of a library, it is possible to offer users access to semantically similar contents of different digital libraries. For instance, a list of semantically similar publications from completely different subject areas and from different digital libraries can be made accessible. In addition, the user is able to see a wider profile about authors, enriched with information such as biographical details, name alternatives, images, job titles, institute affiliations, etc. This information comes from a wide variety of sources, most of which are not library sources. In order to make such scenarios a reality, this dissertation follows two approaches. The first approach is about crosslinking digital library content in order to offer semantically similar publications based on additional information for a publication. Hence, this approach uses publication-related metadata as a basis. The aligned terms between linked open data repositories/thesauri are considered as an important starting point by considering narrower, broader and related concepts through semantic data models such as SKOS. Information retrieval methods are applied to identify publications with high semantic similarity. For this purpose, approaches of vector space models and "word embedding" are applied and analyzed comparatively. The analyses are performed in digital libraries with different thematic focuses (e.g. economy and agriculture). Using machine learning techniques, metadata is enriched, e.g. with synonyms for content keywords, in order to further improve similarity calculations. To ensure quality, the proposed approaches will be analyzed comparatively with different metadata sets, which will be assessed by experts. Through the combination of different information retrieval methods, the quality of the results can be further improved. This is especially true when user interactions offer possibilities for adjusting the search properties. In the second approach, which this dissertation pursues, author-related data are harvested in order to generate a comprehensive author profile for a digital library. For this purpose, non-library sources, such as linked data repositories (e.g. WIKIDATA) and library sources, such as authority data, are used. If such different sources are used, the disambiguation of author names via the use of already existing persistent identifiers becomes necessary. To this end, we offer an algorithmic approach to disambiguate authors, which makes use of authority data such as the Virtual International Authority File (VIAF). Referring to computer sciences, the methodological value of this dissertation lies in the combination of semantic technologies with methods of information retrieval and artificial intelligence to increase the interoperability between digital libraries and between libraries with non-library sources. By positioning this dissertation as an application-oriented contribution to improve the interoperability, two major contributions are made in the context of digital libraries: (1) The retrieval of information from different Digital Libraries can be made possible via a single access. (2) Existing information about authors is collected from different sources and aggregated into one author profile.Parallel zur Etablierung des Konzepts einer „Digitalen Bibliothek“ gab es rasante Weiterentwicklungen in den Bereichen semantischer Technologien, Information Retrieval und künstliche Intelligenz. Die Idee ist es, mit ihrer Hilfe bibliographische Daten, also Inhalte von Bibliotheken, miteinander zu vernetzen und „intelligent“ mit zusätzlichen, insbesondere nicht-bibliothekarischen Informationen anzureichern. Durch die Verknüpfung von Inhalten einer Bibliothek wird es möglich, einen Zugang für Benutzer*innen anzubieten, über den semantisch ähnliche Inhalte unterschiedlicher Digitaler Bibliotheken zugänglich werden. Beispielsweise können hierüber ausgehend von einer bestimmten Publikation eine Liste semantisch ähnlicher Publikationen ggf. aus völlig unterschiedlichen Themenfeldern und aus verschiedenen digitalen Bibliotheken zugänglich gemacht werden. Darüber hinaus können sich Nutzer*innen ein breiteres Autoren-Profil anzeigen lassen, das mit Informationen wie biographischen Angaben, Namensalternativen, Bildern, Berufsbezeichnung, Instituts-Zugehörigkeiten usw. angereichert ist. Diese Informationen kommen aus unterschiedlichsten und in der Regel nicht-bibliothekarischen Quellen. Um derartige Szenarien Realität werden zu lassen, verfolgt diese Dissertation zwei Ansätze. Der erste Ansatz befasst sich mit der Vernetzung von Inhalten Digitaler Bibliotheken, um auf Basis zusätzlicher Informationen für eine Publikation semantisch ähnliche Publikationen anzubieten. Dieser Ansatz verwendet publikationsbezogene Metadaten als Grundlage. Die verknüpften Begriffe zwischen verlinkten offenen Datenrepositorien/Thesauri werden als wichtiger Angelpunkt betrachtet, indem Unterbegriffe, Oberbegriffe und verwandten Konzepte über semantische Datenmodelle, wie SKOS, berücksichtigt werden. Methoden des Information Retrieval werden angewandt, um v.a. Publikationen mit hoher semantischer Verwandtschaft zu identifizieren. Zu diesem Zweck werden Ansätze des Vektorraummodells und des „Word Embedding“ eingesetzt und vergleichend analysiert. Die Analysen werden in Digitalen Bibliotheken mit unterschiedlichen thematischen Schwerpunkten (z.B. Wirtschaft und Landwirtschaft) durchgeführt. Durch Techniken des maschinellen Lernens werden hierfür Metadaten angereichert, z.B. mit Synonymen für inhaltliche Schlagwörter, um so Ähnlichkeitsberechnungen weiter zu verbessern. Zur Sicherstellung der Qualität werden die beiden Ansätze mit verschiedenen Metadatensätzen vergleichend analysiert wobei die Beurteilung durch Expert*innen erfolgt. Durch die Verknüpfung verschiedener Methoden des Information Retrieval kann die Qualität der Ergebnisse weiter verbessert werden. Dies trifft insbesondere auch dann zu wenn Benutzerinteraktion Möglichkeiten zur Anpassung der Sucheigenschaften bieten. Im zweiten Ansatz, den diese Dissertation verfolgt, werden autorenbezogene Daten gesammelt, verbunden mit dem Ziel, ein umfassendes Autorenprofil für eine Digitale Bibliothek zu generieren. Für diesen Zweck kommen sowohl nicht-bibliothekarische Quellen, wie Linked Data-Repositorien (z.B. WIKIDATA) und als auch bibliothekarische Quellen, wie Normdatensysteme, zum Einsatz. Wenn solch unterschiedliche Quellen genutzt werden, wird die Disambiguierung von Autorennamen über die Nutzung bereits vorhandener persistenter Identifikatoren erforderlich. Hierfür bietet sich ein algorithmischer Ansatz für die Disambiguierung von Autoren an, der Normdaten, wie die des Virtual International Authority File (VIAF) nachnutzt. Mit Bezug zur Informatik liegt der methodische Wert dieser Dissertation in der Kombination von semantischen Technologien mit Verfahren des Information Retrievals und der künstlichen Intelligenz zur Erhöhung von Interoperabilität zwischen Digitalen Bibliotheken und zwischen Bibliotheken und nicht-bibliothekarischen Quellen. Mit der Positionierung dieser Dissertation als anwendungsorientierter Beitrag zur Verbesserung von Interoperabilität werden zwei wesentliche Beiträge im Kontext Digitaler Bibliotheken geleistet: (1) Die Recherche nach Informationen aus unterschiedlichen Digitalen Bibliotheken kann über einen Zugang ermöglicht werden. (2) Vorhandene Informationen über Autor*innen werden aus unterschiedlichsten Quellen eingesammelt und zu einem Autorenprofil aggregiert

    A decadal view of biodiversity informatics: challenges and priorities

    Get PDF
    Biodiversity informatics plays a central enabling role in the research community's efforts to address scientific conservation and sustainability issues. Great strides have been made in the past decade establishing a framework for sharing data, where taxonomy and systematics has been perceived as the most prominent discipline involved. To some extent this is inevitable, given the use of species names as the pivot around which information is organised. To address the urgent questions around conservation, land-use, environmental change, sustainability, food security and ecosystem services that are facing Governments worldwide, we need to understand how the ecosystem works. So, we need a systems approach to understanding biodiversity that moves significantly beyond taxonomy and species observations. Such an approach needs to look at the whole system to address species interactions, both with their environment and with other species. It is clear that some barriers to progress are sociological, basically persuading people to use the technological solutions that are already available. This is best addressed by developing more effective systems that deliver immediate benefit to the user, hiding the majority of the technology behind simple user interfaces. An infrastructure should be a space in which activities take place and, as such, should be effectively invisible. This community consultation paper positions the role of biodiversity informatics, for the next decade, presenting the actions needed to link the various biodiversity infrastructures invisibly and to facilitate understanding that can support both business and policy-makers. The community considers the goal in biodiversity informatics to be full integration of the biodiversity research community, including citizens’ science, through a commonly-shared, sustainable e-infrastructure across all sub-disciplines that reliably serves science and society alike

    Semantics-Aware Indexing of Geospatial Resources Based on Multilingual Thesauri: Methodology and Preliminary Results

    Get PDF
    the discovery functionality implemented by geoportals is primarily based on the syntactic matching of users’ search pattern against descriptive metadata, such as title, abstract, or keywords. As a consequence, the retrieval process is often hampered by linguistic issues related to multilingualism, semantic heterogeneity (synonymy, homonymy, etc.), and terminology mismatch in general. We propose a novel criterion for associating resources to language-neutral identifiers, thus enabling multilingual access to datasets and services as well as query expansion and refinement. The methodology has been successfully applied to the ISO-compliant metadata records aggregated by the INSPIRE Geoportal and is driving semantics-aware extensions of the discovery functionalities of the latter
    corecore