18,243 research outputs found
Huge automatically extracted training sets for multilingual Word Sense Disambiguation
We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low- resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic. org
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Structural Stability of Lexical Semantic Spaces: Nouns in Chinese and French
Many studies in the neurosciences have dealt with the semantic processing of
words or categories, but few have looked into the semantic organization of the
lexicon thought as a system. The present study was designed to try to move
towards this goal, using both electrophysiological and corpus-based data, and
to compare two languages from different families: French and Mandarin Chinese.
We conducted an EEG-based semantic-decision experiment using 240 words from
eight categories (clothing, parts of a house, tools, vehicles,
fruits/vegetables, animals, body parts, and people) as the material. A
data-analysis method (correspondence analysis) commonly used in computational
linguistics was applied to the electrophysiological signals.
The present cross-language comparison indicated stability for the following
aspects of the languages' lexical semantic organizations: (1) the
living/nonliving distinction, which showed up as a main factor for both
languages; (2) greater dispersion of the living categories as compared to the
nonliving ones; (3) prototypicality of the \emph{animals} category within the
living categories, and with respect to the living/nonliving distinction; and
(4) the existence of a person-centered reference gradient. Our
electrophysiological analysis indicated stability of the networks at play in
each of these processes. Stability was also observed in the data taken from
word usage in the languages (synonyms and associated words obtained from
textual corpora).Comment: 17 pages, 4 figure
Filling Knowledge Gaps in a Broad-Coverage Machine Translation System
Knowledge-based machine translation (KBMT) techniques yield high quality in
domains with detailed semantic models, limited vocabulary, and controlled input
grammar. Scaling up along these dimensions means acquiring large knowledge
resources. It also means behaving reasonably when definitive knowledge is not
yet available. This paper describes how we can fill various KBMT knowledge
gaps, often using robust statistical techniques. We describe quantitative and
qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT
system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9
- …