6,286 research outputs found
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
Zero-shot Neural Transfer for Cross-lingual Entity Linking
Cross-lingual entity linking maps an entity mention in a source language to
its corresponding entry in a structured knowledge base that is in a different
(target) language. While previous work relies heavily on bilingual lexical
resources to bridge the gap between the source and the target languages, these
resources are scarce or unavailable for many low-resource languages. To address
this problem, we investigate zero-shot cross-lingual entity linking, in which
we assume no bilingual lexical resources are available in the source
low-resource language. Specifically, we propose pivot-based entity linking,
which leverages information from a high-resource "pivot" language to train
character-level neural entity linking models that are transferred to the source
low-resource language in a zero-shot manner. With experiments on 9 low-resource
languages and transfer through a total of 54 languages, we show that our
proposed pivot-based framework improves entity linking accuracy 17% (absolute)
on average over the baseline systems, for the zero-shot scenario. Further, we
also investigate the use of language-universal phonological representations
which improves average accuracy (absolute) by 36% when transferring between
languages that use different scripts.Comment: To appear in AAAI 201
Cross-Lingual Cross-Media Content Linking: Annotations and Joint Representations
Dagstuhl Seminar 15201 was conducted on “Cross-Lingual Cross-Media Content Linking: Annotations and Joint Representations”. Participants from around the world participated in the seminar and presented state-of-the-art and ongoing research related to the seminar topic. An executive summary of the seminar, abstracts of the talks from participants and working group discussions are presented in the forthcoming sections
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
- …