710 research outputs found
Comparing human and automatic thesaurus mapping approaches in the agricultural domain
Knowledge organization systems (KOS), like thesauri and other controlled
vocabularies, are used to provide subject access to information systems across
the web. Due to the heterogeneity of these systems, mapping between
vocabularies becomes crucial for retrieving relevant information. However,
mapping thesauri is a laborious task, and thus big efforts are being made to
automate the mapping process. This paper examines two mapping approaches
involving the agricultural thesaurus AGROVOC, one machine-created and one human
created. We are addressing the basic question "What are the pros and cons of
human and automatic mapping and how can they complement each other?" By
pointing out the difficulties in specific cases or groups of cases and grouping
the sample into simple and difficult types of mappings, we show the limitations
of current automatic methods and come up with some basic recommendations on
what approach to use when.Comment: 10 pages, Int'l Conf. on Dublin Core and Metadata Applications 200
Mining Domain-Specific Thesauri from Wikipedia: A case study
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts
Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review
Since the Simple Knowledge Organization System (SKOS) specification and its
SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a
significant number of conventional knowledge organization systems (KOS)
(including thesauri, classification schemes, name authorities, and lists of
codes and terms, produced before the arrival of the ontology-wave) have made
their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS"
as an umbrella term to refer to all of the value vocabularies and lightweight
ontologies within the Semantic Web framework. The paper provides an overview of
what the LOD KOS movement has brought to various communities and users. These
are not limited to the colonies of the value vocabulary constructors and
providers, nor the catalogers and indexers who have a long history of applying
the vocabularies to their products. The LOD dataset producers and LOD service
providers, the information architects and interface designers, and researchers
in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper
examines a set of the collected cases (experimental or in real applications)
and aims to find the usages of LOD KOS in order to share the practices and
ideas among communities and users. Through the viewpoints of a number of
different user groups, the functions of LOD KOS are examined from multiple
dimensions. This paper focuses on the LOD dataset producers, vocabulary
producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on
Digital Librarie
Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining
Important data are locked in ancient literature. It would be uneconomic to
produce these data again and today or to extract them without the help of text
mining technologies. Vespa is a text mining project whose aim is to extract
data on pest and crops interactions, to model and predict attacks on crops, and
to reduce the use of pesticides. A few attempts proposed an agricultural
information access. Another originality of our work is to parse documents with
a dependency of the document architecture
Analysis of equivalence mapping for terminology services
This paper assesses the range of equivalence or mapping types required to facilitate interoperability in the context of a distributed terminology server. A detailed set of mapping types were examined, with a view to determining their validity for characterizing relationships between mappings from selected terminologies (AAT, LCSH, MeSH, and UNESCO) to the Dewey Decimal Classification (DDC) scheme. It was hypothesized that the detailed set of 19 match types proposed by Chaplan in 1995 is unnecessary in this context and that they could be reduced to a less detailed conceptually-based set. Results from an extensive mapping exercise support the main hypothesis and a generic suite of match types are proposed, although doubt remains over the current adequacy of the developing Simple Knowledge Organization System (SKOS) Core Mapping Vocabulary Specification (MVS) for inter-terminology mapping
AGROVOC: The linked data concept hub for food and agriculture
Newly acquired, aggregated and shared data are essential for innovation in food and agriculture to improve the discoverability of research. Since the early 1980′s, the Food and Agriculture Organization of the United Nations (FAO) has coordinated AGROVOC, a valuable tool for data to be classified homogeneously, facilitating interoperability and reuse. AGROVOC is a multilingual and controlled vocabulary designed to cover concepts and terminology under FAO's areas of interest. It is the largest Linked Open Data set about agriculture available for public use and its highest impact is through facilitating the access and visibility of data across domains and languages. This chapter has the aim of describing the current status of one of the most popular thesaurus in all FAO’s areas of interest, and how it has become the Linked Data Concept Hub for food and agriculture, through new procedures put in plac
Pruning-based identification of domain ontologies
We present a novel approach of extracting a domain ontology from large-scale thesauri. Concepts are identified to be relevant for a domain based on their frequent occurrence in domain texts. The approach allows to bootstrap the ontology engineering process from given legacy thesauri and identifies an initial domain ontology that may easily be refined by experts in a later stage. We present a thorough evaluation of the results obtained in building a biosecurity ontology for the UN FAO AOS project
Terminology server for improved resource discovery: analysis of model and functions
This paper considers the potential to improve distributed information retrieval via a terminologies server. The restriction upon effective resource discovery caused by the use of disparate terminologies across services and collections is outlined, before considering a DDC spine based approach involving inter-scheme mapping as a possible solution. The developing HILT model is discussed alongside other existing models and alternative approaches to solving the terminologies problem. Results from the current HILT pilot are presented to illustrate functionality and suggestions are made for further research and development
- …