37 research outputs found
Platform for Full-Syntax Grammar Development Using Meta-grammar Constructs
PACLIC 20 / Wuhan, China / 1-3 November, 200
Vers la création d'un Verbnet du français
International audienceVerbNet est une ressource lexicale pour les verbes anglais qui est bien utile pour le TAL grâce à sa large couverture et sa classification cohérente. Une telle ressource n'existe pas pour le français malgré quelques tentatives. Nous montrons comment adapter semi-automatiquement VerbNet en utilisant deux ressources lexicales existantes, le LVF (Les Verbes Français) et le LG (Lexique-Grammaire). Abstract. VerbNet is an English lexical resource that has proven useful for NLP due to its high coverage and coherent classification. Such a resource doesn't exist for French, despite some (mostly automatic and unsupervised) at-tempts. We show how to semi-automatically adapt VerbNet using existing lexical resources, namely LVF (Les Verbes Français) and LG (Lexique-Grammaire). Mots-clés : VerbNet, cadres de sous-catégorisations, rôles sémantiques
Intégration de VerbNet dans un réalisateur profond
La génération automatique de texte (GAT) a comme objectif de produire du texte compréhensible
en langue naturelle à partir de données non-linguistiques. Les générateurs font essentiellement
deux tâches : d’abord ils déterminent le contenu d’un message à communiquer,
puis ils sélectionnent les mots et les constructions syntaxiques qui serviront à transmettre le
message, aussi appellée la réalisation linguistique. Pour générer des textes aussi naturels que
possible, un système de GAT doit être doté de ressources lexicales riches. Si on veut avoir
un maximum de flexibilité dans les réalisations, il nous faut avoir accès aux différentes propriétés
de combinatoire des unités lexicales d’une langue donnée. Puisque les verbes sont au
coeur de chaque énoncé et qu’ils contrôlent généralement la structure de la phrase, il faudrait
encoder leurs propriétés afin de produire du texte exploitant toute la richesse des langues.
De plus, les verbes ont des propriétés de combinatoires imprévisibles, c’est pourquoi il faut
les encoder dans un dictionnaire.
Ce mémoire porte sur l’intégration de VerbNet, un dictionnaire riche de verbes de l’anglais
et de leurs comportements syntaxiques, Ă un rĂ©alisateur profond, GenDR. Pour procĂ©der Ă
cette implémentation, nous avons utilisé le langage de programmation Python pour extraire
les données de VerbNet et les manipuler pour les adapter à GenDR, un réalisateur profond
basé sur la théorie Sens-Texte. Nous avons ainsi intégré 274 cadres syntaxiques à GenDR
ainsi que 6 393 verbes de l’anglais.Natural language generation’s (NLG) goal is to produce understandable text from nonlinguistic
data. Generation essentially consists in two tasks : first, determine the content of
a message to transmit and then, carefully select the words that will transmit the desired
message. That second task is called linguistic realization. An NLG system requires access to
a rich lexical ressource to generate natural-looking text. If we want a maximum of flexibility
in the realization, we need access to the combinatory properties of a lexical unit. Because
verbs are at the core of each utterance and they usually control its structure, we should
encode their properties to generate text representing the true richness of any language. In
addition to that, verbs are highly unpredictible in terms of syntactic behaviours, which is
why we need to store them into a dictionary.
This work is about the integration of VerbNet, a rich lexical ressource on verbs and
their syntactic behaviors, into a deep realizer called GenDR. To make this implementation
possible, we have used the Python programming language to extract VerbNet’s data and to
adapt it to GenDR. We have imported 274 syntactic frames and 6 393 verbs
Inducing Stereotypical Character Roles from Plot Structure
If we are to understand stories, we must understand characters: characters are central to every narrative and drive the action forward. Critically, many stories (especially cultural ones) employ stereotypical character roles in their stories for different purposes, including efficient communication among bundles of default characteristics and associations, ease understanding of those characters\u27 role in the overall narrative, and many more. These roles include ideas such as hero, villain, or victim, as well as culturally-specific roles such as, for example, the donor (in Russian tales) or the trickster (in Native American tales). My thesis aims to learn these roles automatically, inducing them from data using a clustering technique.
The first step of learning character roles, however, is to identify which coreference chains correspond to characters, which are defined by narratologists as animate entities that drive the plot forward. The first part of my work has focused on this character identification problem, specifically focusing on the problem of animacy detection. Prior work treated animacy as a word-level property, and researchers developed statistical models to classify words as either animate or inanimate. I claimed this approach to the problem is ill-posed and presented a new hybrid approach for classifying the animacy of coreference chains that achieved state-of-the-art performance.
The next step of my work is to develop approaches first to identify the characters and then a new unsupervised clustering approach to learn stereotypical roles. My character identification system consists of two stages: first, I detect animate chains from the coreference chains using my existing animacy detector; second, I apply a supervised machine learning model that identifies which of those chains qualify as characters. I proposed a narratologically grounded definition of character and built a supervised machine learning model with a small set of features that achieved state-of-the-art performance.
In the last step, I successfully implemented a clustering approach with plot and thematic information to cluster the archetypes. This work resulted in a completely new approach to understanding the structure of stories, greatly advancing the state-of-the-art of story understanding