6 research outputs found
Angol Ă©s magyar nyelvű kĂ©rdĂ©sek a számĂtĂłgĂ©pes nyelvĂ©szetben
A cikkben korpuszalapĂş vizsgálatok segĂtsĂ©gĂ©vel bemutatjuk a magyar Ă©s angol nyelvű kĂ©rdĂ©sek sajátságait, kĂĽlönös figyelmet fordĂtva a közössĂ©gi mĂ©diában elĹ‘fordulĂł kĂ©rdĂ©sekre. Emellett a kĂ©rdĂ©sek számĂtĂłgĂ©pes nyelvĂ©szeti hasznosĂthatĂłságára is rámutatunk, egyrĂ©szt többszavas kifejezĂ©sek azonosĂtásában, másrĂ©szt eldöntendĹ‘ kĂ©rdĂ©sekre felajánlott automatikus válaszlehetĹ‘sĂ©gek továbbfejlesztĂ©sĂ©ben
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
International audienceMultiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as " words with spaces ". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems