Search CORE

18 research outputs found

Multilingual resources for NLP in the Lexical Markup Framework (LMF)

Author: Bel Nuria
Calzolari Nicoletta
Francopoulo Gil
George Monte
Monachini Monica
Pet Mandy
Soria Claudia
Publication venue: Springer
Publication date: 01/01/2008
Field of study

Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that a consensual specification on monolingual, bilingual and multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of Lexical Markup Framework (LMF, ISO-24613) is to define a standard for lexicons that covers multilingual lexical data

UPF Digital Repository

PUblication MAnagement

Computerization of African languages-French dictionaries

Author: Enguehard Chantal
Mangeot Mathieu
Publication venue
Publication date: 22/05/2014
Field of study

This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

HAL Université de Savoie

LMF Reloaded

Author: Bański Piotr
Bowers Jack
Calzolari Nicoletta
George Monte
Khan Fahad
Khemakhem Mohamed
Pet Mandy
Romary Laurent
Publication venue: HAL CCSD
Publication date: 23/05/2019
Field of study

International audienceLexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases. An in-depth review is currently underway within the standardisation subcommittee , ISO-TC37/SC4/WG4, to find a more modular, flexible and durable follow up to the original LMF standard published in 2008. In this paper we will present some of the major improvements which have so far been implemented in the new version of LMF

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Hal-Diderot

HAL-Rennes 1

Lexically-based Ontologies and Ontologically Based Lexicons

Author: Calzolari Nicoletta
Del Gratta Riccardo
Monachini Monica
Quochi Valeria
Soria Claudia
Toral Ruiz Antonio
Publication venue: IAAI
Publication date
Field of study

This paper deals with the relations between ontologies and lexicons. We study the role of these two components and their evolution during the last years in the field of Computational Linguistics. Subsequently, we survey the current lines of research at ILC-CNR which tackle this topic. They involve (I) the reuse of already existing Lexical Resources to derive formal ontologies, (II) the conversion and combination of terminologies into rich and formal Lexical Resources and (III) the use of formal ontologies as the backbone of multilingual Lexical Resources

PUblication MAnagement

Named Entity WordNet

Author: Monachini Monica
Mu?oz Rafael
Toral Ruiz Antonio
Publication venue: European Language Resources Association (ELRA)
Publication date
Field of study

This paper presents the automatic extension of Princeton WordNet with Named Entities (NEs). This new resource is called Named Entity WordNet. Our method maps the noun is-a hierarchy of WordNet to Wikipedia categories, identifies the NEs present in the latter and extracts different information from them such as written variants, definitions, etc. This information is inserted into a NE repository. A module that converts from this generic repository to the WordNet specific format has been developed. The paper explores different aspects of our methodology such as the treatment of polysemous terms, the identification of hyponyms within the Wikipedia categorization system, the identification of Wikipedia articles which are NEs and the design of a NE repository compliant with the LMF ISO standard. So far, this procedure enriches WordNet with 310,742 NEs and 381,043 ?instance of? relations

PUblication MAnagement

A Web-based Architecture for Interoperability of Lexical Resources

Author: Bartolini Roberto
Calzolari Nicoletta
Caselli Tommaso
D\u27Onofrio Luca
Del Gratta Riccardo
Enea Alessandro
Monachini Monica
Quochi Valeria
Soria Claudia
Toral Antonio
Publication venue: City University of Hong Kong Press
Publication date
Field of study

In this paper we present aWeb Service Architecture for managing high level interoperability of Language Resources (LRs) by means of a Service Oriented Architecture (SOA) and the use of ISO standards, such as ISO LMF. We propose a layered architecture which separates the management of legacy resources (data collection) from data aggregation (workflow) and data access (user requests). We provide a case study to demonstrate how the proposed architecture is capable of managing data exchange among different lexical services in a coherent way and show how the use of a lexical standard becomes of primary importance when a protocol of interoperability is defined

PUblication MAnagement

Interoperability Framework: The FLaReNet action plan proposal

Author: Calzolari Nicoletta
Monachini Monica
Quochi Valeria
Publication venue: Asian Federation of Natural Language Processing
Publication date
Field of study

Standards are fundamental to ex-change, preserve, maintain and integrate data and language resources, and as an essential basis of any language resource infrastructure. This paper promotes an Interoperability Framework as a dynamic environment of standards and guidelines, also intended to support the provision of language-(web)service interoperability. In the past two decades, the need to define common practices and formats for linguistic resources has been increasingly recognized and sought. Today open, collaborative, shared data is at the core of a sound language strategy, and standardisation is actively on the move. This paper first describes the current landscape of standards, and presents the major barriers to their adoption; then, it describes those scenarios that critically involve the use of standards and provide a strong motivation for their adoption; lastly, a series of actions and steps needed to operationalise standards and achieve a full interoperability for Language Resources and Technologies are proposed

PUblication MAnagement

COVER: a linguistic resource combining common sense and lexicographic information

Author: Lieto Antonio
Mensa Enrico
Radicioni Daniele P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Lexical resources are fundamental to tackle many tasks that are central to present and prospective research in Text Mining, Information Retrieval, and connected to Natural Language Processing. In this article we introduce COVER, a novel lexical resource, along with COVERAGE, the algorithm devised to build it. In order to describe concepts, COVER proposes a compact vectorial representation that combines the lexicographic precision characterizing BabelNet and the rich common-sense knowledge featuring ConceptNet. We propose COVER as a reliable and mature resource, that has been employed in as diverse tasks as conceptual categorization, keywords extraction, and conceptual similarity. The experimental assessment is performed on the last task: we report and discuss the obtained results, pointing out future improvements. We conclude that COVER can be directly exploited to build applications, and coupled with existing resources, as well

Archivio della Ricerca - Università di Salerno

Institutional Research Information System University of Turin

Languoid, Doculect, and Glossonym: Formalizing the Notion 'Language'

Author: Cysouw Michael
Good Jeff
Publication venue
Publication date: 19/12/2013
Field of study

It is perfectly reasonable for laypeople and non-linguistic scholars to use names for languages without reflecting on the proper definition of the objects referred to by these names. Simply using a name like English or Witotoan suffices as an informal communicative designation for a particular language or a language group. However, for the linguistics community, which is by definition occupied with the details of languages and language variation, it is somewhat bizarre that there does not exist a proper technical apparatus to talk about intricate differences in opinion about the precise sense of a name like English or Witotoan when used in academic discussion. We propose three interrelated concepts—LANGUOID, DOCULECT, and GLOSSONYM—which provide a principled basis for discussion of different points of view about key issues, such as whether two varieties should be associated with the same language, and allow for a precise description of what exactly is being claimed by the use of a given genealogical or areal group name. The framework they provide should be especially useful to researchers who work on underdescribed languages where basic issues of classification remain unresolved

ScholarSpace at University of Hawai'i at Manoa