2,566 research outputs found
New Perspectives in Sinographic Language Processing Through the Use of Character Structure
Chinese characters have a complex and hierarchical graphical structure
carrying both semantic and phonetic information. We use this structure to
enhance the text model and obtain better results in standard NLP operations.
First of all, to tackle the problem of graphical variation we define
allographic classes of characters. Next, the relation of inclusion of a
subcharacter in a characters, provides us with a directed graph of allographic
classes. We provide this graph with two weights: semanticity (semantic relation
between subcharacter and character) and phoneticity (phonetic relation) and
calculate "most semantic subcharacter paths" for each character. Finally,
adding the information contained in these paths to unigrams we claim to
increase the efficiency of text mining methods. We evaluate our method on a
text classification task on two corpora (Chinese and Japanese) of a total of 18
million characters and get an improvement of 3% on an already high baseline of
89.6% precision, obtained by a linear SVM classifier. Other possible
applications and perspectives of the system are discussed.Comment: 17 pages, 5 figures, presented at CICLing 201
Combining Multiple Methods for the Automatic Construction of Multilingual WordNets
This paper explores the automatic construction of a multilingual Lexical
Knowledge Base from preexisting lexical resources. First, a set of automatic
and complementary techniques for linking Spanish words collected from
monolingual and bilingual MRDs to English WordNet synsets are described.
Second, we show how resulting data provided by each method is then combined to
produce a preliminary version of a Spanish WordNet with an accuracy over 85%.
The application of these combinations results on an increment of the extracted
connexions of a 40% without losing accuracy. Both coarse-grained (class level)
and fine-grained (synset assignment level) confidence ratios are used and
evaluated. Finally, the results for the whole process are presented.Comment: 7 pages, 4 postscript figure
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
A Logic-based Approach for Recognizing Textual Entailment Supported by Ontological Background Knowledge
We present the architecture and the evaluation of a new system for
recognizing textual entailment (RTE). In RTE we want to identify automatically
the type of a logical relation between two input texts. In particular, we are
interested in proving the existence of an entailment between them. We conceive
our system as a modular environment allowing for a high-coverage syntactic and
semantic text analysis combined with logical inference. For the syntactic and
semantic analysis we combine a deep semantic analysis with a shallow one
supported by statistical models in order to increase the quality and the
accuracy of results. For RTE we use logical inference of first-order employing
model-theoretic techniques and automated reasoning tools. The inference is
supported with problem-relevant background knowledge extracted automatically
and on demand from external sources like, e.g., WordNet, YAGO, and OpenCyc, or
other, more experimental sources with, e.g., manually defined presupposition
resolutions, or with axiomatized general and common sense knowledge. The
results show that fine-grained and consistent knowledge coming from diverse
sources is a necessary condition determining the correctness and traceability
of results.Comment: 25 pages, 10 figure
PRIME: A System for Multi-lingual Patent Retrieval
Given the growing number of patents filed in multiple countries, users are
interested in retrieving patents across languages. We propose a multi-lingual
patent retrieval system, which translates a user query into the target
language, searches a multilingual database for patents relevant to the query,
and improves the browsing efficiency by way of machine translation and
clustering. Our system also extracts new translations from patent families
consisting of comparable patents, to enhance the translation dictionary
- …