Search CORE

14,558 research outputs found

Cross-document Cross-lingual Information Extraction and Tracking

Author: Ji Heng
Publication venue: DigitalCommons@URI
Publication date: 18/02/2010
Field of study

Most current information extraction analyzes documents in isolation. The net result is a set of disconnected, inaccurate and often redundant annotations, because events are repeated in many news stories. In this talk we will present a new task of cross-document cross-lingual information extraction and tracking and its evaluation metrics. From enormous multi-lingual documents we identify important person entities which are frequently involved in events as ‘centroid entities’. Then we link the events involving the same centroid entity along a time line. We will also present a system performing this task and our current approaches to address the main research challenges. We will discuss how we can take advantage of redundancy to improve the accuracy of relation and event annotation, by means of - Cross-document event coreference resolution - Event ranking by salience and novelty, and - Event organization by participant, time, and place - Name translation - Knowledge Discovery from Google Ngrams - Domain Adaption Techniques for Applying Information Extraction to Scientific Literatur

DigitalCommons@URI

Using Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation

Author: Aggarwal Nitish
Asooja Kartik
Gracia Jorge
Gómez-Pérez A.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/12/2012
Field of study

Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge

CiteSeerX

Archivo Digital UPM

Zero-Shot Cross-Lingual Transfer with Meta Learning

Author: Augenstein Isabelle
Bekoulis Giannis
Bjerva Johannes
Nooralahzadeh Farhad
Publication venue
Publication date: 01/01/2020
Field of study

Learning what to share between tasks has been a topic of great importance recently, as strategic sharing of knowledge has been shown to improve downstream task performance. This is particularly important for multilingual applications, as most languages in the world are under-resourced. Here, we consider the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English. We show that this challenging setup can be approached using meta-learning, where, in addition to training a source language model, another model learns to select which training instances are the most beneficial to the first. We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks (natural language inference, question answering). Our extensive experimental setup demonstrates the consistent effectiveness of meta-learning for a total of 15 languages. We improve upon the state-of-the-art for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA dataset). A comprehensive error analysis indicates that the correlation of typological features between languages can partly explain when parameter sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

VBN

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

Author: Daxenberger Johannes
Eger Steffen
Gurevych Iryna
Stab Christian
Publication venue
Publication date: 08/06/2018
Field of study

Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201

arXiv.org e-Print Archive

TUbiblio

PRIME: A System for Multi-lingual Patent Retrieval

Author: Fujii Atsushi
Fukui Masatoshi
Higuchi Shigeto
Ishikawa Tetsuya
Publication venue
Publication date: 01/01/2001
Field of study

Given the growing number of patents filed in multiple countries, users are interested in retrieving patents across languages. We propose a multi-lingual patent retrieval system, which translates a user query into the target language, searches a multilingual database for patents relevant to the query, and improves the browsing efficiency by way of machine translation and clustering. Our system also extracts new translations from patent families consisting of comparable patents, to enhance the translation dictionary

arXiv.org e-Print Archive

CiteSeerX

Introduction to the special issue on cross-language algorithms and applications

Author: Bangalore Srinivas
Lambert Patrik
Montiel-Ponsoda Elena
Màrquez Lluís
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC