2,844 research outputs found

    PRIME: A System for Multi-lingual Patent Retrieval

    Full text link
    Given the growing number of patents filed in multiple countries, users are interested in retrieving patents across languages. We propose a multi-lingual patent retrieval system, which translates a user query into the target language, searches a multilingual database for patents relevant to the query, and improves the browsing efficiency by way of machine translation and clustering. Our system also extracts new translations from patent families consisting of comparable patents, to enhance the translation dictionary

    Extending, trimming and fusing WordNet for technical documents

    Get PDF
    This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the words in the glosses. The disambiguation is tested in a cross-lingual retrieval task, showing considerable improvement (7%-11%). The condensed hierarchies can be used as browse-interfaces to the documents complementary to retrieval

    Cross-language Text Classification with Convolutional Neural Networks From Scratch

    Get PDF
    Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach

    From media crossing to media mining

    Get PDF
    This paper reviews how the concept of Media Crossing has contributed to the advancement of the application domain of information access and explores directions for a future research agenda. These will include themes that could help to broaden the scope and to incorporate the concept of medium-crossing in a more general approach that not only uses combinations of medium-specific processing, but that also exploits more abstract medium-independent representations, partly based on the foundational work on statistical language models for information retrieval. Three examples of successful applications of media crossing will be presented, with a focus on the aspects that could be considered a first step towards a generalized form of media mining

    Cross-lingual document retrieval categorisation and navigation based on distributed services

    Get PDF
    The widespread use of the Internet across countries has increased the need for access to document collections that are often written in languages different from a user’s native language. In this paper we describe Clarity, a Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian. Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation, text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system architecture that was developed to support the interaction and coordination of Clarity’s distributed services, (iii) the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only such system that currently supports Baltic languages
    corecore