Search CORE

90 research outputs found

Using BabelNet to improve OOV coverage in SMT

Author: Du Jinhua
Way Andy
Zydron Andrzej
Publication venue: European Language Resources Association
Publication date: 28/05/2016
Field of study

Out-of-vocabulary words (OOVs) are a ubiquitous and difficult problem in statistical machine translation (SMT). This paper studies different strategies of using BabelNet to alleviate the negative impact brought about by OOVs. BabelNet is a multilingual encyclopedic dictionary and a semantic network, which not only includes lexicographic and encyclopedic terms, but connects concepts and named entities in a very large network of semantic relations. By taking advantage of the knowledge in BabelNet, three different methods – using direct training data, domain-adaptation techniques and the BabelNet API – are proposed in this paper to obtain translations for OOVs to improve system performance. Experimental results on English–Polish and English–Chinese language pairs show that domain adaptation can better utilize BabelNet knowledge and performs better than other methods. The results also demonstrate that BabelNet is a really useful tool for improving translation performance of SMT systems

Irish Universities

DCU Online Research Access Service

Language resources and linked data: a practical perspective

Author: Baron Ciro
Dojchinovski Milan
Flati Tiziano
Gracia del Río Jorge
McCra John P.
Vila Suero Daniel
Publication venue: E.T.S. de Ingenieros Informáticos (UPM)
Publication date: 01/01/2014
Field of study

Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper

Archivo Digital UPM

Mapping Natural Language Labels to Structured Web Resources

Author: Basile Valerio
Debora Nozza
Elena Cabrio
Fabien Gandon
Publication venue: CEUR
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Mapping natural language labels to structured web resources

Author: Basile Valerio
Cabrio Elena
Gandon Fabien
Nozza Debora
Publication venue: (seleziona...)
Publication date: 01/01/2018
Field of study

Archivio istituzionale della Ricerca - Bocconi

Auto-Illustration of Short French Texts

Author: Pawlowski Paula
Publication venue: HAL CCSD
Publication date: 21/09/2020
Field of study

International audienc

INRIA a CCSD electronic archive server

Cross-Lingual Link Discovery for Under-Resourced Languages

Author: Ahmadi Sina
Apostol Elena-Simona
Bosque-Gil Julia
Chiarcos Christian
Dojchinovski Milan
Gkirtzou Katerina
Gracia Jorge
Gromann Dagmar
Liebeskind Chaya
Rosner Michael
Serasset Gilles
Truica Ciprian-Octavian
Valūnaitė-Oleškevičienė Giedrė
Publication venue
Publication date: 01/01/2022
Field of study

CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources

Mykolas Romeris University Institutional Repository

Evaluating Multiple Caching Strategies for Semantic Network Applications

Author: French John Davies
Malis Steven Mark
Publication venue: Digital WPI
Publication date: 26/03/2015
Field of study

Semantic networks are often used as a method of relating multiple pieces of data to each other. ConceptNet is a semantic network that contains information about words and how they relate to other words. ConceptNet and other semantic networks are often hosted remotely and accessed as a service, and data retrieval times can be large. This project examines multiple data caching strategies and their impact on the performance of two existing applications that make use of ConceptNet data. We found that the largest factor in whether or not caching improves the performance of semantic network applications is the access pattern of the particular application

DigitalCommons@WPI