    Generating Coreferential Descriptions from a Structured Model of the Context

    Colloque avec actes et comité de lecture. internationale.International audienceThis paper shows on the basis of a corpus study how a model of the context should be structured for the generation of coreferring descriptions in French. We show that this way of structuring the context can help to generate more paraphrases and a particular kind of referring expressions used to add information about the referent

    Une analyse des emplois du démonstratif en corpus

    Colloque avec actes et comité de lecture. nationale.National audienceCet article propose une nouvelle classification des utilisations des démonstratifs, une mise en oeuvre de cette classification dans une analyse de corpus et présente les résultats obtenus au terme de cette analyse. La classification proposée est basée sur celles existant dans la littérature et étendues pour permettre la génération de groupes nominaux démonstratifs. L'analyse de corpus montre en particulier que la nature "reclassifiante" du démonstratif lui permet d'assumer deux fonctions (une fonction anaphorique et une fonction de support pour de l'information nouvelle) et qu'il existe des moyens variés de réaliser ces fonctions

    Coreferential Definite and Demonstrative Descriptions in French: A Corpus Study for Text Generation

    Colloque avec actes et comité de lecture. internationale.International audienceThis paper presents a new classification for the use of definite and demonstrative descriptions, its application in a corpus analysis and the results of this analysis. The proposed classification is based on existing literature and extended to support the generation of definite and demonstrative NPs. The corpus analysis shows in particular, that subsequent mentions of a referent can perform two functions (repeating given information and/or introducing new information). The comparison between definite and demonstrative determiners leads to preliminary data for generation algorithms

    Annotating Anaphoric and Bridging Relations with MMAX

    We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an annotation tool and demonstrate its use

    'Healthy' Coreference: Applying Coreference Resolution to the Health Education Domain

    This thesis investigates coreference and its resolution within the domain of health education. Coreference is the relationship between two linguistic expressions that refer to the same real-world entity, and resolution involves identifying this relationship among sets of referring expressions. The coreference resolution task is considered among the most difficult of problems in Artificial Intelligence; in some cases, resolution is impossible even for humans. For example, "she" in the sentence "Lynn called Jennifer while she was on vacation" is genuinely ambiguous: the vacationer could be either Lynn or Jennifer. There are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount. No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data. Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future

    Creating ontology-based metadata by annotation for the semantic web

    Discourse-level Relations For Opinion Analysis

    Opinion analysis deals with subjective phenomena such as judgments, evaluations, feelings, emotions, beliefs and stances. The availability of public opinion over the Internet and face to face conversations; coupled with the need to understand and mine these for end applications has motivated a great amount of research in this field in recent times. Researchers have explored a wide array of knowledge resources for opinion analysis, from words and phrases to syntactic dependencies and semantic relations.In this thesis, we investigate a discourse-level treatment for opinion analysis.In order to realize the discourse-level analysis, we propose a new linguistic representational scheme designed to support interdependent interpretations of opinions in the discourse. We adapt and extend an existing subjectivity annotation scheme to capture discourse-level relations in multi-party meeting corpus. Human inter-annotator agreement studies show that trained human annotators can recognize the elements of our linguistic scheme. Empirically, we test the impact of our discourse-level relations on fine-grained polarity classification. In this process, we also explore two different global inference models for incorporating discourse-based information to augment word-based information. Our results show that the discourse-level relations can augment and improve upon word-based methods for effective fine-grained opinion polarity classification. Further, in this thesis, we explore linguistically motivated features and a global inference paradigm for learning the discourse-level relations form the annotated data. We employ the ideas from our linguistic scheme for recognizing stances in dual-sided debates from the product and political domains. For product debates, we use web mining and rules to learn and employ elements of our discourse-level relations in an unsupervised fashion. For political debates, on the other hand, we take a more exploratory, supervised approach, and encode the building blocks of our discourse-level relations as features for stance classification. Our results show that, the ideas behind the discourse level relations can be learnt and employed effectively to improve overall stance recognition in product debates

    Extended nominal coreference and bridging anaphora (an approach to annotation of Czech data in Prague dependency treebank)

    V této práci představujeme jeden z možných modelů zpracovaní rozšířené textové koreference a asociační anafory na velkém korpusu textů, který dále používáme pro anotaci daných vztahů na textech Pražského závislostního korpusu. Na základě literatury z oblastí teorie reference, diskurzu a některých dalších poznatků teoretické lingvistiky na jedné straně a s použitím existujících anotačních metodik na straně druhé jsme vytvořili detailní klasifikaci textově koreferenčních vztahů a typů vztahů asociační anafory. V rámci textové koreference rozlišujeme dva typy textově koreferenčních vztahů - koreferenční vztah mezi jmennými frázemi se specifickou referencí a koreferenční vztah mezi jmennými frázemi s nespecifickou, především generickou referencí. Pro asociační anaforu jsme stanovili šest typů vztahů: vztah PART mezi částí a celkem, vztah SUBSET mezi množinou a podmnožinou/prvkem množiny, vztah FUNCT mezi entitou a unikátní funkcí na této entitě, vztah CONTRAST sémantického a kontextového protikladu, vztah ANAF anaforického odkazování mezi nekoreferenčními entitami a vztah REST pro jiné případy asociační anafory. Jedním z úkolů výzkumu bylo vytvořit systém teoretických principů, které je nutno dodržovat při anotaci koreferenčních vztahů a asociační anafory. V rámci tohoto systému byl zaveden například princip...The dissertation presents one of the possible models of processmg extended textual coreference and bridging anaphora in a large textual corpora, which we then use for annotation of certain relations in texts of the Prague Oependency Treebank (POT). Based, on the one hand, on the literature concerning the theory of reference, discource and some findings of theoretical linguistics, and, on the other hand, using the existing methodology of annotations, we created a detailed classification of textual coreferential relations and types of bridging anaphora. Within textual coreference, we distinguish between two types of textual coreferential relations - coreferential relations between noun phrases with specific reference and coreferential relation between noun phrases with non-specific, primarily generic, reference. We determined six types of relations for bridging anaphora: relation PART- between part and whole; relation SUBSET - between a set and a subset or element of a set; FUNCT - between an object and a unique function on that entity; CONTRAST- between semantíc and contextual opposites; relation ANAF of anaphorical referencing between noncoreferencial objects; REST- for other examples of bridging anaphora. One of the goals of the research is to create a system of theoretical principals that would be used...Institute of Czech Language and Theory of CommunicationÚstav českého jazyka a teorie komunikaceFilozofická fakultaFaculty of Art

    Zur Semantik und Referenz des temporalanaphorischen Pronominaladverbs danach

    Das Pronominaladverb danach stellt basierend auf seinem anaphorischen Element einen relationalen Zusammenhang zwischen zwei im Text erwähnten Referenten (konzeptuellen Einheiten) her. Dabei verweist sein anaphorisches Element (da-) auf einen Referenten, der zuvor im Text durch einen Antezedenten (eine textuelle Einheit) beschrieben wurde, und sein relationales Element (-nach) verknüpft den vorerwähnten Referenten zeitlich mit dem Referenten des Satzes, in den danach syntaktisch integriert ist (im Folgenden Kotextsatz). (1) Wir schauten Top Gun, diesen Tom-Cruise-Film über amerikanische Kampfpiloten. Danach flogen meine Freunde mit ihren hölzernen Segelgleitern steile Kurven, sie wollten sein wie Tom Cruise. Ziel der Arbeit ist es, möglichst viele verschiedene Formen temporalanaphorischer danach-Bezüge zu untersuchen, um den anaphorischen Auflösungsprozess von danach in seiner Vielschichtigkeit im Rahmen zweier kognitiver Textverstehensmodelle zu erfassen. Im Fünf-Ebenen-Modell wird die Interpretation anaphorischer danach-Bezüge als ein auf unterschiedlichen Interpretationsebenen operierender Prozess repräsentiert und als Wechselspiel sprachlicher und konzeptueller Informationen verstanden. Im Wettbewerbsmodell wird die anaphorische Auflösung von danach als Interaktion konkurrierender textueller und konzeptueller Auflösungsfaktoren und -regeln dargestellt. Es basiert auf Korpusanalysen und liefert somit ein differenziertes und vollständiges Bild zur Interpretation relationaler Anaphern wie danach