1,038 research outputs found
PRIME: A System for Multi-lingual Patent Retrieval
Given the growing number of patents filed in multiple countries, users are
interested in retrieving patents across languages. We propose a multi-lingual
patent retrieval system, which translates a user query into the target
language, searches a multilingual database for patents relevant to the query,
and improves the browsing efficiency by way of machine translation and
clustering. Our system also extracts new translations from patent families
consisting of comparable patents, to enhance the translation dictionary
A Logic-based Approach for Recognizing Textual Entailment Supported by Ontological Background Knowledge
We present the architecture and the evaluation of a new system for
recognizing textual entailment (RTE). In RTE we want to identify automatically
the type of a logical relation between two input texts. In particular, we are
interested in proving the existence of an entailment between them. We conceive
our system as a modular environment allowing for a high-coverage syntactic and
semantic text analysis combined with logical inference. For the syntactic and
semantic analysis we combine a deep semantic analysis with a shallow one
supported by statistical models in order to increase the quality and the
accuracy of results. For RTE we use logical inference of first-order employing
model-theoretic techniques and automated reasoning tools. The inference is
supported with problem-relevant background knowledge extracted automatically
and on demand from external sources like, e.g., WordNet, YAGO, and OpenCyc, or
other, more experimental sources with, e.g., manually defined presupposition
resolutions, or with axiomatized general and common sense knowledge. The
results show that fine-grained and consistent knowledge coming from diverse
sources is a necessary condition determining the correctness and traceability
of results.Comment: 25 pages, 10 figure
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004
International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants
AAA4LLL - Acquisition, Annotation, Augmentation for Lively Language Learning
In this paper we describe a method for enhancing the process of studying Japanese by a user-centered approach. This approach includes three parts: an innovative way of acquiring learning material from topic seeds, multifaceted sentence analysis to present sentence annotations, and the browser-integrated augmentation of perusing Wikipedia pages of special interest for the learner. This may result in new topic seeds to yield additional learning content, thus repeating the cycle
- …