Search CORE

1,498 research outputs found

Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

Author: Fujii Atsushi
Ishikawa Tetsuya
Publication venue
Publication date: 01/01/2001
Field of study

Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our system is highly dependent on the quality of the translation of technical terms. However, the technical term translation is still problematic in that technical terms are often compound words, and thus new terms are progressively created by combining existing base words. In addition, Japanese often represents loanwords based on its special phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we produce a Japanese/English dictionary for base words, and translate compound words on a word-by-word basis. We also use a probabilistic method to resolve translation ambiguity. For the second problem, we use a transliteration method, which corresponds words unlisted in the base word dictionary to their phonetic equivalents in the target language. We evaluate our system using a test collection for CLIR, and show that both the compound word translation and transliteration methods improve the system performance

arXiv.org e-Print Archive

CiteSeerX

Distributional Measures of Semantic Distance: A Survey

Author: Hirst Graeme
Mohammad Saif M.
Publication venue
Publication date: 01/01/2012
Field of study

The ability to mimic human notions of semantic distance has widespread applications. Some measures rely only on raw text (distributional measures) and some rely on knowledge sources such as WordNet. Although extensive studies have been performed to compare WordNet-based measures with human judgment, the use of distributional measures as proxies to estimate semantic distance has received little attention. Even though they have traditionally performed poorly when compared to WordNet-based measures, they lay claim to certain uniquely attractive features, such as their applicability in resource-poor languages and their ability to mimic both semantic similarity and semantic relatedness. Therefore, this paper presents a detailed study of distributional measures. Particular attention is paid to flesh out the strengths and limitations of both WordNet-based and distributional measures, and how distributional measures of distance can be brought more in line with human notions of semantic distance. We conclude with a brief discussion of recent work on hybrid measures

arXiv.org e-Print Archive

CiteSeerX

Thesaurus-assisted search term selection and query expansion: a review of user-centred studies

Author: Chowdhury G.
Revie C.W.
Shiri A.A.
Publication venue
Publication date: 01/01/2002
Field of study

This paper provides a review of the literature related to the application of domain-specific thesauri in the search and retrieval process. Focusing on studies which adopt a user-centred approach, the review presents a survey of the methodologies and results from empirical studies undertaken on the use of thesauri as sources of term selection for query formulation and expansion during the search process. It summaries the ways in which domain-specific thesauri from different disciplines have been used by various types of users and how these tools aid users in the selection of search terms. The review consists of two main sections covering, firstly studies on thesaurus-aided search term selection and secondly those dealing with query expansion using thesauri. Both sections are illustrated with case studies that have adopted a user-centred approach

University of Strathclyde Institutional Repository

A pilot investigation of Information Extraction in the semantic annotation of archaeological reports

Author: Tudhope Douglas
Vlachidis Andreas
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2012
Field of study

The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain. Copyright © 2012 Inderscience Enterprises Ltd

Crossref

University of South Wales Research Explorer

UWE Bristol Research Repository

UCL Discovery

Treatment of Semantic Heterogeneity in Information Retrieval

Author: Hellweg Heiko
Krause Jürgen
Mandl Thomas
Marx Jutta
Mutschke Peter
Müller Matthias N.O.
Strötgen Robert
Publication venue: Institut de Recherche Juridique de la Sorbonne (IRJS)
Publication date: 01/01/2001
Field of study

"Nowadays, users of information services are faced with highly decentralised, heterogeneous document sources with different content analysis. Semantic heterogeneity occurs e.g. when resources using different systems for content description are searched using a single query system. This report describes several approaches of handling semantic heterogeneity used in projects of the German Social Science Information Centre." (author's abstract

SSOAR - Social Science Open Access Repository

Selective Sampling for Example-based Word Sense Disambiguation

Author: Fujii Atsushi
Inui Kentaro
Tanaka Hozumi
Tokunaga Takenobu
Publication venue
Publication date: 01/01/1998
Field of study

This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure

arXiv.org e-Print Archive

CiteSeerX

Benchmarking Ontologies: Bigger or Better?

Author: A Faatz
A Gangemi
A Gomez-Perez
A Gómez-Pérez
A Mädche
A Mädche
A Rzhetsky
A Spooner
Andrey Rzhetsky
Anna Divoli
AR Aronson
AR Aronson
AR Aronson
AT McCray
AT McCray
AT McCray
AT McCray
AT McCray
B Smith
BA Kipfer
C Brewster
C Brewster
C Brewster
C Laird
C Rosse
CE Lipscomb
CJ Bult
CL Smith
D Lin
D Maynard
DL Cook
E Riloff
FB Rogers
G Jurasinski
G Miller
I Scholastic
I Sim
Ilya Mayzus
J Brank
J Devlin
J Evermann
J Yu
JA Blake
James A. Evans
JC Park
JI Rodale
JR Firth
JS Justeson
K Dellschaft
K Toutanova
K Toutanova
K Verspoor
K Verspoor
K. Bretonnel Cohen
KB Cohen
Lixia Yao
LM Spencer
M Ashburner
M Grüninger
M Minsky
M Missikoff
M Sabou
N Guarino
O Bodenreider
P Buitelaar
P Cimiano
PD Karp
R Cornet
R Navigli
S Hyun
S Kiritchenko
S Schulz
S York
S Zhang
SH Brown
TR Gruber
U Hahn
V Walden
W Ceusters
Y Sure
Z Harris
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central