2 research outputs found
Design Challenges in Low-resource Cross-lingual Entity Linking
Cross-lingual Entity Linking (XEL), the problem of grounding mentions of
entities in a foreign language text into an English knowledge base such as
Wikipedia, has seen a lot of research in recent years, with a range of
promising techniques. However, current techniques do not rise to the challenges
introduced by text in low-resource languages (LRL) and, surprisingly, fail to
generalize to text not taken from Wikipedia, on which they are usually trained.
This paper provides a thorough analysis of low-resource XEL techniques,
focusing on the key step of identifying candidate English Wikipedia titles that
correspond to a given foreign language mention. Our analysis indicates that
current methods are limited by their reliance on Wikipedia's interlanguage
links and thus suffer when the foreign language's Wikipedia is small. We
conclude that the LRL setting requires the use of outside-Wikipedia
cross-lingual resources and present a simple yet effective zero-shot XEL
system, QuEL, that utilizes search engines query logs. With experiments on 25
languages, QuEL~shows an average increase of 25\% in gold candidate recall and
of 13\% in end-to-end linking accuracy over state-of-the-art baselines.Comment: EMNLP 202
Low-resource Languages: A Review of Past Work and Future Challenges
A current problem in NLP is massaging and processing low-resource languages
which lack useful training attributes such as supervised data, number of native
speakers or experts, etc. This review paper concisely summarizes previous
groundbreaking achievements made towards resolving this problem, and analyzes
potential improvements in the context of the overall future research direction