6 research outputs found

    Multilingual collocation extraction with a syntactic parser

    Get PDF
    An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobile-window method. The evaluation experiment investigates several levels of the significance lists, uses a fine-grained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP application

    A MWE Acquisition and Lexicon Builder Web Service

    Get PDF
    This paper describes the development of a web-service tool for the automatic extraction of Multi-word expressions lexicons, which has been integrated in a distributed platform for the automatic creation of linguistic resources. The main purpose of the work described is thus to provide a (computationally "light") tool that produces a full lexical resource: multi-word terms/items with relevant and useful attached information that can be used for more complex processing tasks and applications (e.g. parsing, MT, IE, query expansion, etc.). The output of our tool is a MW lexicon formatted and encoded in XML according to the Lexical Mark-up Framework. The tool is already functional and available as a service. Evaluation experiments show that the tool precision is of about 80%

    Dynamic resonance and social reciprocity in language change:The case of Good morrow

    Get PDF
    Entrenchment (i.e. Langacker, 1987) does not necessarily lead to predictable behaviour. This study aims at complementing the usage-based model of language change by oper- ationalising the role of dialogic creativity as a mechanism that can be in competition with conventionalization and grammaticalization. We provide a distinctive collexeme analysis (i.e. Hilpert, 2006) focussing on the constructionalization of the dialogic pair [A: good morrow B e B: (good) morrow (A)] from the 15th up to the 18th century. After reaching the highest degree of entrenchment and automatisation, the dialogic pair will show an increasing tendency to be creatively re-modelled with ad-hoc meanings during online exchanges by means of dynamic resonance (Du Bois, 2014) and non-reciprocal behaviour. We define this creative process of large-scale alteration as entrenchment inhibition. From our data it will emerge that entrenchment inhibition is triggered by spontaneous attempts of producing a creative ‘surplus’ over the expected social reciprocity (Gouldner, 1960) of conventionalized exchanges. This tendency will be shown to be driven by marked attempts of polite and impolite behaviour

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    Multilingual collocation extraction with a syntactic parser

    No full text
    corecore