10,486 research outputs found
Cross-lingual document retrieval categorisation and navigation based on distributed services
The widespread use of the Internet across countries has increased the need for access to document collections
that are often written in languages different from a user’s native language. In this paper we describe Clarity, a
Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian.
Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation,
text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven
methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system
architecture that was developed to support the interaction and coordination of Clarity’s distributed services, (iii)
the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an
example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only
such system that currently supports Baltic languages
Papillon Lexical Database Project: Monolingual Dictionaries and Interlingual Links
International audienceThis paper presents a new research and development project called Papillon. It started as a French-Japanese cooperation between laboratories GETA/CLIPS (Grenoble, France) and NII (Tokyo, Japan). Its goal is to build a multilingual lexical database and to extract from it digital bilingual dictionaries. The database is built with monolingual dictionaries, one for each language of the database, linked to an interlingual dictionary. The pivot architecture of the database is based on Gilles Sérasset's Ph.D. thesis. The structure of the monolingual dictionaries is based on the lexical work done by Igor Melc'uk and Alain Polguère. From the lexical database, it is planned to derive user customized bilingual dictionaries in multiple target formats. It will be possible to generate human usage dictionaries as well as specialized dictionaries for machine translation software. These dictionaries will be available under the terms of an open source license. This project, initiated by some computational linguists, aims at being useful and open to all those who are interested in Japanese and French. It is also opened to any other language. Moreover, the pivot architecture of the database will facilitate the addition of new languages and save translation efforts
User requirement elicitation for cross-language information retrieval
Who are the users of a cross-language retrieval system? Under what circumstances do they need to perform such multi-language searches? How will the task and the context
of use affect successful interaction with the system? Answers to these questions were explored in a user study performed as part of the design stages of Clarity, a EU
founded project on cross-language information retrieval. The findings resulted in a rethink of the planned user interface and a consequent expansion of the set of services
offered. This paper reports on the methodology and techniques used for the elicitation of user requirements as well as how these were in turn transformed into new design
solutions
462 Machine Translation Systems for Europe
We built 462 machine translation systems for all language pairs of the Acquis Communautaire corpus. We report and analyse the performance of these system, and compare them against pivot translation and a number of system combination methods (multi-pivot, multisource) that are possible due to the available systems.JRC.G.2-Global security and crisis managemen
Building lexical resources: towards programmable contributive platforms
International audienceLexical resources are very important in nowadays society, with the globalization and the increase of world communi- cation and exchanges. There are clearly identified needs, both for humans and machines. Nevertheless, very few efforts are actually done in this domain. Consequently, there is an important lack of freely available good quality resources, especially for under- resourced languages. Furthermore, the majority of existing bilin- gual dictionaries is built with one language as English. Therefore, if one wants to translate from one language (that is not English) to another, it uses English as a pivot. And even for English native speakers, it creates a lot of misunderstandings that can be critical in many situations. In order to create and extend freely available good quality rich lexical resources for under-resourced languages online with a community of voluntary contributors, Jibiki, an online generic platform for managing (lookup, editing, import, export) any kind of lexical resources encoded in XML, has been developed. This platform is successfully used in several dictionary construction projects. Concerning the data, a serious game has been launched in order to collect precious lexical information such as collocations that will be integrated later into dictionary entries. Work is now done on extending our platform in order to reuse the resulting resources and enriching them by synchronization with the other systems (language learners and translators environments, machine translation systems, etc.)
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Training supervised video captioning model requires coupled video-caption
pairs. However, for many targeted languages, sufficient paired data are not
available. To this end, we introduce the unpaired video captioning task aiming
to train models without coupled video-caption pairs in target language. To
solve the task, a natural choice is to employ a two-step pipeline system: first
utilizing video-to-pivot captioning model to generate captions in pivot
language and then utilizing pivot-to-target translation model to translate the
pivot captions to the target language. However, in such a pipeline system, 1)
visual information cannot reach the translation model, generating visual
irrelevant target captions; 2) the errors in the generated pivot captions will
be propagated to the translation model, resulting in disfluent target captions.
To address these problems, we propose the Unpaired Video Captioning with Visual
Injection system (UVC-VI). UVC-VI first introduces the Visual Injection Module
(VIM), which aligns source visual and target language domains to inject the
source visual information into the target language domain. Meanwhile, VIM
directly connects the encoder of the video-to-pivot model and the decoder of
the pivot-to-target model, allowing end-to-end inference by completely skipping
the generation of pivot captions. To enhance the cross-modality injection of
the VIM, UVC-VI further introduces a pluggable video encoder, i.e., Multimodal
Collaborative Encoder (MCE). The experiments show that UVC-VI outperforms
pipeline systems and exceeds several supervised systems. Furthermore, equipping
existing supervised systems with our MCE can achieve 4% and 7% relative margins
on the CIDEr scores to current state-of-the-art models on the benchmark MSVD
and MSR-VTT datasets, respectively.Comment: Published at IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI
Ethics Recommendations for Crisis Translation Settings
This document is a summary public version of the Ethics Recommendations for Crisis
Translation Settings produced by some of the INTERACT project team. INTERACT is the
International Network in Crisis Translation, a project funded by the European Union’s Horizon
2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement
No. 734211. Further information about the project as a whole is available at:
https://sites.google.com/view/crisistranslation/hom
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling
As a special machine translation task, dialect translation has two main
characteristics: 1) lack of parallel training corpus; and 2) possessing similar
grammar between two sides of the translation. In this paper, we investigate how
to exploit the commonality and diversity between dialects thus to build
unsupervised translation models merely accessing to monolingual data.
Specifically, we leverage pivot-private embedding, layer coordination, as well
as parameter sharing to sufficiently model commonality and diversity among
source and target, ranging from lexical, through syntactic, to semantic levels.
In order to examine the effectiveness of the proposed models, we collect 20
million monolingual corpus for each of Mandarin and Cantonese, which are
official language and the most widely used dialect in China. Experimental
results reveal that our methods outperform rule-based simplified and
traditional Chinese conversion and conventional unsupervised translation models
over 12 BLEU scores.Comment: AAAI 202
Mathematical practice, crowdsourcing, and social machines
The highest level of mathematics has traditionally been seen as a solitary
endeavour, to produce a proof for review and acceptance by research peers.
Mathematics is now at a remarkable inflexion point, with new technology
radically extending the power and limits of individuals. Crowdsourcing pulls
together diverse experts to solve problems; symbolic computation tackles huge
routine calculations; and computers check proofs too long and complicated for
humans to comprehend.
Mathematical practice is an emerging interdisciplinary field which draws on
philosophy and social science to understand how mathematics is produced. Online
mathematical activity provides a novel and rich source of data for empirical
investigation of mathematical practice - for example the community question
answering system {\it mathoverflow} contains around 40,000 mathematical
conversations, and {\it polymath} collaborations provide transcripts of the
process of discovering proofs. Our preliminary investigations have demonstrated
the importance of "soft" aspects such as analogy and creativity, alongside
deduction and proof, in the production of mathematics, and have given us new
ways to think about the roles of people and machines in creating new
mathematical knowledge. We discuss further investigation of these resources and
what it might reveal.
Crowdsourced mathematical activity is an example of a "social machine", a new
paradigm, identified by Berners-Lee, for viewing a combination of people and
computers as a single problem-solving entity, and the subject of major
international research endeavours. We outline a future research agenda for
mathematics social machines, a combination of people, computers, and
mathematical archives to create and apply mathematics, with the potential to
change the way people do mathematics, and to transform the reach, pace, and
impact of mathematics research.Comment: To appear, Springer LNCS, Proceedings of Conferences on Intelligent
Computer Mathematics, CICM 2013, July 2013 Bath, U
Which user interaction for cross-language information retrieval? Design issues and reflections
A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. The authors present three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for low-density languages, and shows how the user-interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focused on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users
- …