127,382 research outputs found

    PLuTO: MT for online patent translation

    Get PDF
    PLuTO – Patent Language Translation Online – is a partially EU-funded commercialization project which specializes in the automatic retrieval and translation of patent documents. At the core of the PLuTO framework is a machine translation (MT) engine through which web-based translation services are offered. The fully integrated PLuTO architecture includes a translation engine coupling MT with translation memories (TM), and a patent search and retrieval engine. In this paper, we first describe the motivating factors behind the provision of such a service. Following this, we give an overview of the PLuTO framework as a whole, with particular emphasis on the MT components, and provide a real world use case scenario in which PLuTO MT services are exploited

    Information Access in a Multilingual World: Transitioning from Research to Real-World Applications

    Get PDF
    Multilingual Information Access (MLIA) is at a turning point wherein substantial real-world applications are being introduced after fifteen years of research into cross-language information retrieval, question answering, statistical machine translation and named entity recognition. Previous workshops on this topic have focused on research and small- scale applications. The focus of this workshop was on technology transfer from research to applications and on what future research needs to be done which facilitates MLIA in an increasingly connected multilingual world

    Real-Time Identification of Parallel Texts from Bilingual Newsfeed

    Get PDF
    Parallel texts are documents that present parallel translations. This paper describes a simple method that can be deployed on a real-time news feed to create an infinitely growing source of parallel texts in French and English. Our experiment was lead on the Canada Newswire news feed. Given some of its intrinsic properties, it was possible to deploy a relatively simple text matching techniques that rely on language independent cognates such numbers, capitalized words, punctuation and new lines characters. On three week of press releases, our system correctly identified the vast majority of parallel press release. It committed only minor errors on repeated news items

    An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries

    Get PDF
    Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by user’s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries. Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. User’s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of user’s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user’s search terms for objects within the foreground of an image. Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository. Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying user’s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p
    corecore