1,310 research outputs found
Les collocations comme indice pour distinguer les genres textuels
Cette étude se propose de vérifier l’efficacité des collocations en tant qu’indice pour distinguer les genres textuels. De plus, elle a le double objectif d’aborder l’exploration de la variabilité de l’italien en utilisant des méthodologies computationnelles, et de vérifier l’efficacité d’une nouvelle mesure d’association dans l’étude des collocations.Quatre typologies de collocations ont été analysées (verbe-nom, nom-adjectif, nom-nom et nom-préposition-nom) dans six genres textuels différents, dont trois sont écrits (textes littéraires, textes académiques et compositions scolaires) et trois sont oraux (conversations, discours et dialogues filmiques).La fréquence des collocations dans les différents genres montre que chaque typologie de texte a des préférences spécifiques pour des typologies de collocations spécifiques; la seule fréquence et la seule distinction entre textes écrits et oraux, toutefois, ne réussit pas à interpréter cette différente distribution selon un modèle cohérent. A cet effet, la mesure statistique de la gravité lexicale semble posséder une efficacité majeure, comme nous essayerons de démontrer.Collocations as an Index for Distinguishing Text GenresThis paper aims to incorporate collocations as an index to distinguish text genres: our main hypothesis is that collocations, as well as other linguistic features, are potentially suitable to identify genres. Thus, this is mostly an exploratory study, aimed at verifying this hypothesis and at taking a deeper look into register variation across different genres in Italian with computational and statistical methods.Furthermore, in a broader perspective, this study might give significant contributions in other fields, such as automatic genre identification [Santini 2004], measure of text cohesion [Louwerse et al. 2004] or text readability, where the detection of collocations as a marker of genres can increase the accuracy of computational tools devoted to these tasks
Recommended from our members
Using linguistic data for English and Spanish verb-noun combination identification
We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision
Automatic Acquisition of Knowledge About Multiword Predicates
PACLIC 19 / Taipei, taiwan / December 1-3, 200
Multiword expressions at length and in depth
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
D6.1: Technologies and Tools for Lexical Acquisition
This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
- …