Search CORE

7 research outputs found

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

Author: EHRMANN Maud
JACQUET GUILLAUME
STEINBERGER Ralf
Publication venue: 'IOS Press'
Publication date: 30/04/2015
Field of study

Since 2004 the European Commission’s Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyamín/Biniamin/Беньямин/ بنیامین Netanyahu/ Netanjahu/Nétanyahou/Netahnyahu/Нетаньяху/ نتنیاهو ). This entity name variant data, known as JRCNames, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union’s Open Data Portal.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

A model for verbalising relations with roles in multiple languages

Author: A Bosca
B Davis
CM Keet
CM Keet
CN Li
I Androutsopoulos
J Byamugisha
J Leo
J McCrae
K Fine
K Kaljurand
N Bouayad-Agha
NN Mathonsi
P Buitelaar
PR Fillottrani
R Denaux
R Stevens
T Baldwin
T Halpin
T Kuhn
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2016
Field of study

Natural language renderings of ontologies facilitate communication with domain experts. While for ontologies with terms in English this is fairly straightforward, it is problematic for grammatically richer languages due to conjugation of verbs, an article that may be dependent on the preposition, or a preposition that modifies the noun. There is no systematic way to deal with such `complex' names of OWL object properties, or their verbalisation with existing language models for annotating ontologies. The modifications occur only when the object performs some {\em role} in a relation, so we propose a conceptual model that can handle this. This requires reconciling the standard view with relational expressions to a positionalist view, which is included in the model and in the formalisation of the mapping between the two. This eases verbalisation and it allows for a more precise representation of the knowledge, yet is still compatible with existing technologies. We have implemented it as a Prot\'eg\'e plugin and validated its adequacy with several languages that need it, such as German and isiZulu

Crossref

UCT Computer Science Research Document Archive

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

Author: Ehrmann Maud
Jacquet Guillaume
Steinberger Ralf
Publication venue: 'IOS Press'
Publication date: 20/05/2016
Field of study

Since 2004 the European Commission's Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyam'in/Biniamin/Беньямин/بنيامين Netanyahu/Netanjahu/N\'{e}tanyahou/Netahny/Нетаньяху/\نتنياهو). This entity name variant data, known as JRC-Names, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union's Open Data Portal

Infoscience - École polytechnique fédérale de Lausanne

ToCT: A task ontology to manage complex templates

Author: Keet CM
Mahlaza Z
Publication venue
Publication date: 01/01/2021
Field of study

Natural language interfaces are a well-known approach to grant non-experts access to semantic web technologies. A number of such systems use simple templates to achieve that for English and more elab-orate solutions for other languages. They keep being designed from scratch in an ad hoc manner, since there is no shared conceptualisation of simple templates and there is no model that is formalised using a Semantic Web language to apply the techniques to itself. We aim to address this by proposing a general-purpose solution in the form of a novel model for templates, formalised as a task ontology in OWL,calledToCT. We used it to develop an ontology-driven text generator for isiZulu, a morphologically-rich language, to test its capabilities. The generator verbalises the TBox of an ontology as validationq uestions. This evaluation showed that the task ontology is sufficiently expressive for the template design, which was subsequently verified with user evaluations who judged the texts positivel

UCT Computer Science Research Document Archive