Search CORE

6 research outputs found

Multilingual collocation extraction with a syntactic parser

Author: Seretan Violeta
Wehrli Eric
Publication venue
Publication date: 18/06/2018
Field of study

An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobile-window method. The evaluation experiment investigates several levels of the significance lists, uses a fine-grained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP application

RERO DOC Digital Library

Recommended from our members

Using linguistic data for English and Spanish verb-noun combination identification

Author: Aduriz Itziar
Carroll John
Díaz de Ilarraza Arantza
Iñurrieta Uxoa
Labaka Gorka
Sarasola Kepa
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 13/12/2016
Field of study

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision

Sussex Research Online

A MWE Acquisition and Lexicon Builder Web Service

Author: Frontini Francesca
Quochi Valeria
Rubino Francesco
Publication venue
Publication date
Field of study

This paper describes the development of a web-service tool for the automatic extraction of Multi-word expressions lexicons, which has been integrated in a distributed platform for the automatic creation of linguistic resources. The main purpose of the work described is thus to provide a (computationally "light") tool that produces a full lexical resource: multi-word terms/items with relevant and useful attached information that can be used for more complex processing tasks and applications (e.g. parsing, MT, IE, query expansion, etc.). The output of our tool is a MW lexicon formatted and encoded in XML according to the Lexical Mark-up Framework. The tool is already functional and available as a service. Evaluation experiments show that the tool precision is of about 80%

PUblication MAnagement

Dynamic resonance and social reciprocity in language change:The case of Good morrow

Author: Culpeper Jonathan Vaughan
Di Cristofaro Matteo
Tantucci Vittorio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Entrenchment (i.e. Langacker, 1987) does not necessarily lead to predictable behaviour. This study aims at complementing the usage-based model of language change by oper- ationalising the role of dialogic creativity as a mechanism that can be in competition with conventionalization and grammaticalization. We provide a distinctive collexeme analysis (i.e. Hilpert, 2006) focussing on the constructionalization of the dialogic pair [A: good morrow B e B: (good) morrow (A)] from the 15th up to the 18th century. After reaching the highest degree of entrenchment and automatisation, the dialogic pair will show an increasing tendency to be creatively re-modelled with ad-hoc meanings during online exchanges by means of dynamic resonance (Du Bois, 2014) and non-reciprocal behaviour. We define this creative process of large-scale alteration as entrenchment inhibition. From our data it will emerge that entrenchment inhibition is triggered by spontaneous attempts of producing a creative ‘surplus’ over the expected social reciprocity (Gouldner, 1960) of conventionalized exchanges. This tendency will be shown to be driven by marked attempts of polite and impolite behaviour

Crossref

Lancaster E-Prints

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

D6.1: Technologies and Tools for Lexical Acquisition

Author: Abrate Matteo
Bacciu Clara
Bel Nuria
Caselli Tommaso
Gavrilidou Maria
Korhonen Anna
Monachini Monica
Padr? Muntsa
Poibeau Thierry
Prokopidis Prokopis
Quochi Valeria
Revilla Eva
Rimell Laura
Tesconi Maurizio
Publication venue
Publication date
Field of study

This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

PUblication MAnagement

Multilingual collocation extraction with a syntactic parser

Author: Seretan Violeta
Wehrli Eric
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Archive ouverte UNIGE