258 research outputs found

    AGROVOC: The linked data concept hub for food and agriculture

    Get PDF
    Newly acquired, aggregated and shared data are essential for innovation in food and agriculture to improve the discoverability of research. Since the early 1980′s, the Food and Agriculture Organization of the United Nations (FAO) has coordinated AGROVOC, a valuable tool for data to be classified homogeneously, facilitating interoperability and reuse. AGROVOC is a multilingual and controlled vocabulary designed to cover concepts and terminology under FAO's areas of interest. It is the largest Linked Open Data set about agriculture available for public use and its highest impact is through facilitating the access and visibility of data across domains and languages. This chapter has the aim of describing the current status of one of the most popular thesaurus in all FAO’s areas of interest, and how it has become the Linked Data Concept Hub for food and agriculture, through new procedures put in plac

    Comparing human and automatic thesaurus mapping approaches in the agricultural domain

    Get PDF
    Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question "What are the pros and cons of human and automatic mapping and how can they complement each other?" By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.Comment: 10 pages, Int'l Conf. on Dublin Core and Metadata Applications 200

    Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

    Get PDF
    Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt

    Fusion architectures for automatic subject indexing under concept drift:Analysis and empirical results on short texts

    Get PDF
    Indexing documents with controlled vocabularies enables a wealth of semantic applications for digital libraries. Due to the rapid growth of scientific publications, machine learning-based methods are required that assign subject descriptors automatically. While stability of generative processes behind the underlying data is often assumed tacitly, it is being violated in practice. Addressing this problem, this article studies explicit and implicit concept drift, that is, settings with new descriptor terms and new types of documents, respectively. First, the existence of concept drift in automatic subject indexing is discussed in detail and demonstrated by example. Subsequently, architectures for automatic indexing are analyzed in this regard, highlighting individual strengths and weaknesses. The results of the theoretical analysis justify research on fusion of different indexing approaches with special consideration on information sharing among descriptors. Experimental results on titles and author keywords in the domain of economics underline the relevance of the fusion methodology, especially under concept drift. Fusion approaches outperformed non-fusion strategies on the tested data sets, which comprised shifts in priors of descriptors as well as covariates. These findings can help researchers and practitioners in digital libraries to choose appropriate methods for automatic subject indexing, as is finally shown by a recent case study

    Content Enrichment of Digital Libraries: Methods, Technologies and Implementations

    Get PDF
    Parallel to the establishment of the concept of a "digital library", there have been rapid developments in the fields of semantic technologies, information retrieval and artificial intelligence. The idea is to use make use of these three fields to crosslink bibliographic data, i.e., library content, and to enrich it "intelligently" with additional, especially non-library, information. By linking the contents of a library, it is possible to offer users access to semantically similar contents of different digital libraries. For instance, a list of semantically similar publications from completely different subject areas and from different digital libraries can be made accessible. In addition, the user is able to see a wider profile about authors, enriched with information such as biographical details, name alternatives, images, job titles, institute affiliations, etc. This information comes from a wide variety of sources, most of which are not library sources. In order to make such scenarios a reality, this dissertation follows two approaches. The first approach is about crosslinking digital library content in order to offer semantically similar publications based on additional information for a publication. Hence, this approach uses publication-related metadata as a basis. The aligned terms between linked open data repositories/thesauri are considered as an important starting point by considering narrower, broader and related concepts through semantic data models such as SKOS. Information retrieval methods are applied to identify publications with high semantic similarity. For this purpose, approaches of vector space models and "word embedding" are applied and analyzed comparatively. The analyses are performed in digital libraries with different thematic focuses (e.g. economy and agriculture). Using machine learning techniques, metadata is enriched, e.g. with synonyms for content keywords, in order to further improve similarity calculations. To ensure quality, the proposed approaches will be analyzed comparatively with different metadata sets, which will be assessed by experts. Through the combination of different information retrieval methods, the quality of the results can be further improved. This is especially true when user interactions offer possibilities for adjusting the search properties. In the second approach, which this dissertation pursues, author-related data are harvested in order to generate a comprehensive author profile for a digital library. For this purpose, non-library sources, such as linked data repositories (e.g. WIKIDATA) and library sources, such as authority data, are used. If such different sources are used, the disambiguation of author names via the use of already existing persistent identifiers becomes necessary. To this end, we offer an algorithmic approach to disambiguate authors, which makes use of authority data such as the Virtual International Authority File (VIAF). Referring to computer sciences, the methodological value of this dissertation lies in the combination of semantic technologies with methods of information retrieval and artificial intelligence to increase the interoperability between digital libraries and between libraries with non-library sources. By positioning this dissertation as an application-oriented contribution to improve the interoperability, two major contributions are made in the context of digital libraries: (1) The retrieval of information from different Digital Libraries can be made possible via a single access. (2) Existing information about authors is collected from different sources and aggregated into one author profile.Parallel zur Etablierung des Konzepts einer „Digitalen Bibliothek“ gab es rasante Weiterentwicklungen in den Bereichen semantischer Technologien, Information Retrieval und künstliche Intelligenz. Die Idee ist es, mit ihrer Hilfe bibliographische Daten, also Inhalte von Bibliotheken, miteinander zu vernetzen und „intelligent“ mit zusätzlichen, insbesondere nicht-bibliothekarischen Informationen anzureichern. Durch die Verknüpfung von Inhalten einer Bibliothek wird es möglich, einen Zugang für Benutzer*innen anzubieten, über den semantisch ähnliche Inhalte unterschiedlicher Digitaler Bibliotheken zugänglich werden. Beispielsweise können hierüber ausgehend von einer bestimmten Publikation eine Liste semantisch ähnlicher Publikationen ggf. aus völlig unterschiedlichen Themenfeldern und aus verschiedenen digitalen Bibliotheken zugänglich gemacht werden. Darüber hinaus können sich Nutzer*innen ein breiteres Autoren-Profil anzeigen lassen, das mit Informationen wie biographischen Angaben, Namensalternativen, Bildern, Berufsbezeichnung, Instituts-Zugehörigkeiten usw. angereichert ist. Diese Informationen kommen aus unterschiedlichsten und in der Regel nicht-bibliothekarischen Quellen. Um derartige Szenarien Realität werden zu lassen, verfolgt diese Dissertation zwei Ansätze. Der erste Ansatz befasst sich mit der Vernetzung von Inhalten Digitaler Bibliotheken, um auf Basis zusätzlicher Informationen für eine Publikation semantisch ähnliche Publikationen anzubieten. Dieser Ansatz verwendet publikationsbezogene Metadaten als Grundlage. Die verknüpften Begriffe zwischen verlinkten offenen Datenrepositorien/Thesauri werden als wichtiger Angelpunkt betrachtet, indem Unterbegriffe, Oberbegriffe und verwandten Konzepte über semantische Datenmodelle, wie SKOS, berücksichtigt werden. Methoden des Information Retrieval werden angewandt, um v.a. Publikationen mit hoher semantischer Verwandtschaft zu identifizieren. Zu diesem Zweck werden Ansätze des Vektorraummodells und des „Word Embedding“ eingesetzt und vergleichend analysiert. Die Analysen werden in Digitalen Bibliotheken mit unterschiedlichen thematischen Schwerpunkten (z.B. Wirtschaft und Landwirtschaft) durchgeführt. Durch Techniken des maschinellen Lernens werden hierfür Metadaten angereichert, z.B. mit Synonymen für inhaltliche Schlagwörter, um so Ähnlichkeitsberechnungen weiter zu verbessern. Zur Sicherstellung der Qualität werden die beiden Ansätze mit verschiedenen Metadatensätzen vergleichend analysiert wobei die Beurteilung durch Expert*innen erfolgt. Durch die Verknüpfung verschiedener Methoden des Information Retrieval kann die Qualität der Ergebnisse weiter verbessert werden. Dies trifft insbesondere auch dann zu wenn Benutzerinteraktion Möglichkeiten zur Anpassung der Sucheigenschaften bieten. Im zweiten Ansatz, den diese Dissertation verfolgt, werden autorenbezogene Daten gesammelt, verbunden mit dem Ziel, ein umfassendes Autorenprofil für eine Digitale Bibliothek zu generieren. Für diesen Zweck kommen sowohl nicht-bibliothekarische Quellen, wie Linked Data-Repositorien (z.B. WIKIDATA) und als auch bibliothekarische Quellen, wie Normdatensysteme, zum Einsatz. Wenn solch unterschiedliche Quellen genutzt werden, wird die Disambiguierung von Autorennamen über die Nutzung bereits vorhandener persistenter Identifikatoren erforderlich. Hierfür bietet sich ein algorithmischer Ansatz für die Disambiguierung von Autoren an, der Normdaten, wie die des Virtual International Authority File (VIAF) nachnutzt. Mit Bezug zur Informatik liegt der methodische Wert dieser Dissertation in der Kombination von semantischen Technologien mit Verfahren des Information Retrievals und der künstlichen Intelligenz zur Erhöhung von Interoperabilität zwischen Digitalen Bibliotheken und zwischen Bibliotheken und nicht-bibliothekarischen Quellen. Mit der Positionierung dieser Dissertation als anwendungsorientierter Beitrag zur Verbesserung von Interoperabilität werden zwei wesentliche Beiträge im Kontext Digitaler Bibliotheken geleistet: (1) Die Recherche nach Informationen aus unterschiedlichen Digitalen Bibliotheken kann über einen Zugang ermöglicht werden. (2) Vorhandene Informationen über Autor*innen werden aus unterschiedlichsten Quellen eingesammelt und zu einem Autorenprofil aggregiert

    Domain Analysis, Discourse Community and Literary and Semantic Warrants: a study of the Brazilian rural landscape

    Get PDF
    The current study deals with the theoretical and methodological feasibility of the use of domain analysis and discourse community, by means of literary and semantic warrants, aiming at the identification of the rural landscape in its ontological dimension, having Knowledge Organization Systems (KOS) as a purpose. The rural landscape and its network of concepts within the field of Information Science (IS) are analyzed in relation to geographic knowledge representation and organization. The methodological procedures, subdivided into bibliographical research and documentary research, included the examination of the 1970 and 2006 editions of the Census of Agriculture, under the responsibility of the Brazilian Institute of Geography and Statistics, and of representation of the rural landscapes in bibliographic classification systems, thesauri, and ontologies. Results show conceptual and standardized landscape terminology by means of the conceptualization of the categories that represent it, resulting from the Geography domain and aggregated to information taken from the 1970 and 200 censuses of agriculture.Estudio que presenta la viabilidad teórico-metodológica del uso del análisis de dominio y comunidad discursiva, a través de garantías literaria y semántica, con el fin de conocer e identificar la dimensión ontológica del paisaje rural, objetivando los Sistemas de Organización del Conocimiento (SOCs). Se analiza el paisaje rural y su red de conceptos en el área de las Ciencias de la Información con relación a la organización y representación del conocimiento geográfico. Los procedimientos metodológicos se dividieron en investigación bibliográfica e investigación documental que examinó los Censos Agropecuarios de 1970 y 2006, bajo la responsabilidad del Instituto Brasileño de Geografía y Estadística (IBGE), así como también examinó la representación del paisaje en los sistemas de clasificación bibliográfica, tesauros y ontologías. Como resultado, se obtuvo la terminología conceptual y estandarizada del paisaje a través de la conceptualización de las categorías que lo representan, resultante del dominio de la Geografía y agregadas a las informaciones provenientes de los censos agropecuarios de 1970 y 2006.The current study deals with the theoretical and methodological feasibility of the use of domain analysis and discourse community, by means of literary and semantic warrants, aiming at the identification of the rural landscape in its ontological dimension, having Knowledge Organization Systems (KOS) as a purpose. The rural landscape and its network of concepts within the field of Library and Information Science (LIS) are analyzed in relation to geographic knowledge representation and organization. The methodological procedures, subdivided into bibliographical research and documentary research, included the examination of the 1970 and 2006 editions of the Brazilian Census of Agriculture, under the responsibility of the Brazilian Institute of Geography and Statistics, and of representation of the rural landscapes in bibliographic classification systems, thesauri, and ontologies. Results show conceptual and standardized landscape terminology by means of the conceptualization of the categories that represent it, resulting from the geography domain and aggregated to information taken from the 1970 and 2006 Brazilian censuses of agriculture

    Use of the FAO-UNESCO Learning Module on Digitisation and Digital Libraries

    Get PDF
    The FAO and UNESCO, two United Nations specialized agencies, have developed a computerized self-learning module on Digitization and Digital Libraries. This paper will summarize the objectives of this project as reported by the sponsors, describe the module, and present the author's experience of testing it in a course on digital libraries in the M.Sc. programme in Information Science at the University of Madras

    Thesauri and Semantic Web: Discussion of the Evolution of Thesauri toward their Integration With the Semantic Web

    Get PDF
    15 p.Thesauri are Knowledge Organization Systems (KOS), that arise from the consensus of wide communities. They have been in use for many years and are regularly updated. Whereas in the past thesauri were designed for information professionals for indexing and searching, today there is a demand for conceptual vocabularies that enable inferencing by machines. The development of the Semantic Web has brought a new opportunity for thesauri, but thesauri also face the challenge of proving that they add value to it. The evolution of thesauri toward their integration with the Semantic Web is examined. Elements and structures in the thesaurus standard, ISO 25964, and SKOS (Simple Knowledge Organization System), the Semantic Web standard for representing KOS, are reviewed and compared. Moreover, the integrity rules of thesauri are contrasted with the axioms of SKOS. How SKOS has been applied to represent some real thesauri is taken into account. Three thesauri are chosen for this aim: AGROVOC, EuroVoc and the UNESCO Thesaurus. Based on the results of this comparison and analysis, the benefits that Semantic Web technologies offer to thesauri, how thesauri can contribute to the Semantic Web, and the challenges that would help to improve their integration with the Semantic Web are discussed.S
    corecore