35 research outputs found
Electronic dictionary of english verb conjugation: its structure, contents and parametrization
Стаття присвячена розв’язанню головних питань створення електронного словника англійської дієслівної дієвідміни з використанням теорії лексикографічних систем (Л-систем). Зокрема, визначено формальну структуру англійського дієслова, схарактеризовано її квазіоснови та квазіфлексії, а також побудовано модель Л-системи словника та визначено його лексикографічні структури, до яких передбачається доступ для користувача. Визначено лексикографічні параметри для репрезентації словозмінної парадигми дієслова. Окреслено завдання навчального та дослідницького характеру, які може розв’язувати розроблюваний словник.The article studies the main issues related to creating electronic dictionary of English verb conjugation by using the theory of lexicographic systems (L-systems). In particular the formal structure of the English verb was determined together with its possible quasi-stems and quasi-flexions. To facilitate functioning of the inflexion dictionary in digital environment the L-system model was built up and lexicographic structures to be made accessible for user were determined. Lexicographic parameters to represent verb conjugation were specified. The tasks of educational and research nature which are possible to be resolved with the English verb conjugation dictionary in question were set out
Dictionaries for language processing. Readability and organization of information
What makes a dictionary exploitable in Natural Language Processing (NLP)? We examine two requirements: readability of information and general architecture, and we focus on the human tasks involving NLP dictionaries: construction, update, check, correction. We exemplify our points with real cases from projects of morpho-syntactic or syntactic-semantic dictionaries.Quelles caractéristiques d'un dictionnaire le rendent exploitable pour le traitement automatique des langues (TAL) ? Nous examinons deux exigences : la lisibilité des informations et l'architecture générale, et nous nous concentrons sur les tâches humaines concernées par les dictionnaires pour le TAL : construction, mise à jour, vérification, correction. Nous illustrons nos arguments par des exemples de cas réels tirés de projets de dictionnaires morpho-syntaxiques ou syntactico-sémantiques
Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Existing approaches to automatic VerbNet-style verb classification are
heavily dependent on feature engineering and therefore limited to languages
with mature NLP pipelines. In this work, we propose a novel cross-lingual
transfer method for inducing VerbNets for multiple languages. To the best of
our knowledge, this is the first study which demonstrates how the architectures
for learning word embeddings can be applied to this challenging
syntactic-semantic task. Our method uses cross-lingual translation pairs to tie
each of the six target languages into a bilingual vector space with English,
jointly specialising the representations to encode the relational information
from English VerbNet. A standard clustering algorithm is then run on top of the
VerbNet-specialised representations, using vector dimensions as features for
learning verb classes. Our results show that the proposed cross-lingual
transfer approach sets new state-of-the-art verb classification performance
across all six target languages explored in this work.Comment: EMNLP 2017 (long paper
Improvement of VerbNet-like resources by frame typing
International audienceVerbenet is a French lexicon developed by " translation " of its English counterpart — VerbNet (Kipper-Schuler, 2005) — and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a " virtuous circle ". We present the principles underlying a program we developed and used to automatically type frames in Verbenet. We also show that our system is portable to other languages
A Dataset for Movie Description
Descriptive video service (DVS) provides linguistic descriptions of movies
and allows visually impaired people to follow a movie along with their peers.
Such descriptions are by design mainly visual and thus naturally form an
interesting data source for computer vision and computational linguistics. In
this work we propose a novel dataset which contains transcribed DVS, which is
temporally aligned to full length HD movies. In addition we also collected the
aligned movie scripts which have been used in prior work and compare the two
different sources of descriptions. In total the Movie Description dataset
contains a parallel corpus of over 54,000 sentences and video snippets from 72
HD movies. We characterize the dataset by benchmarking different approaches for
generating video descriptions. Comparing DVS to scripts, we find that DVS is
far more visual and describes precisely what is shown rather than what should
happen according to the scripts created prior to movie production
Italian VerbNet: A Construction based Approach to Italian Verb Classification
This paper proposes a new method for Italian verb classification -and a preliminary example of resulting classes- inspired by Levin (1993) and VerbNet (Kipper-Schuler, 2005), yet partially independent from these resources; we achieved such a result by integrating Levin and VerbNet’s models of classification with other theoretic frameworks and resources. The classification is rooted in the constructionist framework (Goldberg, 1995; 2006) and is distribution-based. It is also semantically characterized by a link to FrameNet’ssemanticframesto represent the event expressed by a class. However, the new Italian classes maintain the hierarchic “tree” structure and monotonic nature of VerbNet’s classes, and, where possible, the original names (e.g.: Verbs of Killing, Verbs of Putting, etc.). We therefore propose here a taxonomy compatible with VerbNet but at the same time adapted to Italian syntax and semantics. It also addresses a number of problems intrinsic to the original classifications, such as the role of argument alternations, here regarded simply as epiphenomena, consistently with the constructionist approach
Vonzatkeretek vizsgálata orvostudományi tárgyú, angol nyelvű szabadalmi szövegeken
Orvostudományi tárgyú, angol nyelv szabadalmi szövegekben el- forduló igék s fnevek vonzatkereteit vizsgáltuk. Az elfordulási gyakoriságuk alapján összeállítottunk egy kifejezetten az orvostudományi tárgyú szabadalmi szövegekre jellemz vonzatkerettárat, amely hasznosítható a hasonló tárgyú szövegekre alkalmazandó szintaktikai és szemantikai elemzk építésében