8,168 research outputs found
Comparing human and automatic thesaurus mapping approaches in the agricultural domain
Knowledge organization systems (KOS), like thesauri and other controlled
vocabularies, are used to provide subject access to information systems across
the web. Due to the heterogeneity of these systems, mapping between
vocabularies becomes crucial for retrieving relevant information. However,
mapping thesauri is a laborious task, and thus big efforts are being made to
automate the mapping process. This paper examines two mapping approaches
involving the agricultural thesaurus AGROVOC, one machine-created and one human
created. We are addressing the basic question "What are the pros and cons of
human and automatic mapping and how can they complement each other?" By
pointing out the difficulties in specific cases or groups of cases and grouping
the sample into simple and difficult types of mappings, we show the limitations
of current automatic methods and come up with some basic recommendations on
what approach to use when.Comment: 10 pages, Int'l Conf. on Dublin Core and Metadata Applications 200
Challenges and issues in terminology mapping : a digital library perspective
In light of information retrieval problems caused by the use of different subject schemes, this paper provides an overview of the terminology problem within the digital library field. Various proposed solutions are outlined and issues within one approach - terminology mapping are highlighted.Desk-based review of existing research. Findings - Discusses benefits of the mapping approach, which include improved retrieval effectiveness for users and an opportunity to overcome problems associated with the use of multilingual schemes. Also describes various drawbacks such as the labour intensive nature and expense of such an approach, the different levels of granularity in existing schemes, and the high maintenance requirements due to scheme updates, and not least the nature of user terminology. General review of mapping techniques as a potential solution to the terminology problem
Cross-concordances: terminology mapping and its effectiveness for information retrieval
The German Federal Ministry for Education and Research funded a major
terminology mapping initiative, which found its conclusion in 2007. The task of
this terminology mapping initiative was to organize, create and manage
'cross-concordances' between controlled vocabularies (thesauri, classification
systems, subject heading lists) centred around the social sciences but quickly
extending to other subject areas. 64 crosswalks with more than 500,000
relations were established. In the final phase of the project, a major
evaluation effort to test and measure the effectiveness of the vocabulary
mappings in an information system environment was conducted. The paper reports
on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200
Analysis of equivalence mapping for terminology services
This paper assesses the range of equivalence or mapping types required to facilitate interoperability in the context of a distributed terminology server. A detailed set of mapping types were examined, with a view to determining their validity for characterizing relationships between mappings from selected terminologies (AAT, LCSH, MeSH, and UNESCO) to the Dewey Decimal Classification (DDC) scheme. It was hypothesized that the detailed set of 19 match types proposed by Chaplan in 1995 is unnecessary in this context and that they could be reduced to a less detailed conceptually-based set. Results from an extensive mapping exercise support the main hypothesis and a generic suite of match types are proposed, although doubt remains over the current adequacy of the developing Simple Knowledge Organization System (SKOS) Core Mapping Vocabulary Specification (MVS) for inter-terminology mapping
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
Terminology server for improved resource discovery: analysis of model and functions
This paper considers the potential to improve distributed information retrieval via a terminologies server. The restriction upon effective resource discovery caused by the use of disparate terminologies across services and collections is outlined, before considering a DDC spine based approach involving inter-scheme mapping as a possible solution. The developing HILT model is discussed alongside other existing models and alternative approaches to solving the terminologies problem. Results from the current HILT pilot are presented to illustrate functionality and suggestions are made for further research and development
HILT : High-Level Thesaurus Project. Phase IV and Embedding Project Extension : Final Report
Ensuring that Higher Education (HE) and Further Education (FE) users of the JISC IE can find appropriate learning, research and information resources by subject search and browse in an environment where most national and institutional service providers - usually for very good local reasons - use different subject schemes to describe their resources is a major challenge facing the JISC domain (and, indeed, other domains beyond JISC). Encouraging the use of standard terminologies in some services (institutional repositories, for example) is a related challenge. Under the auspices of the HILT project, JISC has been investigating mechanisms to assist the community with this problem through a JISC Shared Infrastructure Service that would help optimise the value obtained from expenditure on content and services by facilitating subject-search-based resource sharing to benefit users in the learning and research communities. The project has been through a number of phases, with work from earlier phases reported, both in published work elsewhere, and in project reports (see the project website: http://hilt.cdlr.strath.ac.uk/). HILT Phase IV had two elements - the core project, whose focus was 'to research, investigate and develop pilot solutions for problems pertaining to cross-searching multi-subject scheme information environments, as well as providing a variety of other terminological searching aids', and a short extension to encompass the pilot embedding of routines to interact with HILT M2M services in the user interfaces of various information services serving the JISC community. Both elements contributed to the developments summarised in this report
- …