2,844 research outputs found
PRIME: A System for Multi-lingual Patent Retrieval
Given the growing number of patents filed in multiple countries, users are
interested in retrieving patents across languages. We propose a multi-lingual
patent retrieval system, which translates a user query into the target
language, searches a multilingual database for patents relevant to the query,
and improves the browsing efficiency by way of machine translation and
clustering. Our system also extracts new translations from patent families
consisting of comparable patents, to enhance the translation dictionary
Extending, trimming and fusing WordNet for technical documents
This paper describes a tool for the automatic
extension and trimming of a multilingual
WordNet database for cross-lingual retrieval
and multilingual ontology building in
intranets and domain-specific document
collections. Hierarchies, built from
automatically extracted terms and combined
with the WordNet relations, are trimmed
with a disambiguation method based on the
document salience of the words in the
glosses. The disambiguation is tested in a
cross-lingual retrieval task, showing
considerable improvement (7%-11%). The
condensed hierarchies can be used as
browse-interfaces to the documents
complementary to retrieval
Cross-language Text Classification with Convolutional Neural Networks From Scratch
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach
From media crossing to media mining
This paper reviews how the concept of Media Crossing has contributed to the advancement of the application domain of information access and explores directions for a future research agenda. These will include themes that could help to broaden the scope and to incorporate the concept of medium-crossing in a more general approach that not only uses combinations of medium-specific processing, but that also exploits more abstract medium-independent representations, partly based on the foundational work on statistical language models for information retrieval. Three examples of successful applications of media crossing will be presented, with a focus on the aspects that could be considered a first step towards a generalized form of media mining
Cross-lingual document retrieval categorisation and navigation based on distributed services
The widespread use of the Internet across countries has increased the need for access to document collections
that are often written in languages different from a user’s native language. In this paper we describe Clarity, a
Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian.
Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation,
text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven
methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system
architecture that was developed to support the interaction and coordination of Clarity’s distributed services, (iii)
the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an
example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only
such system that currently supports Baltic languages
- …