1,276 research outputs found

    Primary and secondary discourse connectives: definitions and lexicons

    Get PDF
    Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation)

    ISO-based annotated multilingual parallel corpus for discourse markers

    Get PDF
    Discourse markers carry information about the discourse structure and organization, and also signal local dependencies or epistemological stance of speaker. They provide instructions on how to interpret the discourse, and their study is paramount to understand the mechanism underlying discourse organization. This paper presents a new language resource, an ISO-based annotated multilingual parallel corpus for discourse markers. The corpus comprises nine languages, Bulgarian, Lithuanian, German, European Portuguese, Hebrew, Romanian, Polish, and Macedonian, with English as a pivot language. In order to represent the meaning of the discourse markers, we propose an annotation scheme of discourse relations from ISO 24617-8 with a plug-in to ISO 24617-2 for communicative functions. We describe an experiment in which we applied the annotation scheme to assess its validity. The results reveal that, although some extensions are required to cover all the multilingual data, it provides a proper representation of discourse markers value. Additionally, we report some relevant contrastive phenomena concerning discourse markers interpretation and role in discourse. This first step will allow us to develop deep learning methods to identify and extract discourse relations and communicative functions, and to represent that information as Linguistic Linked Open Data (LLOD)

    Proceedings

    Get PDF
    Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 98 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

    Connective-Lex: A Web-Based Multilingual Lexical Resource for Connectives

    Get PDF
    In this paper, we present a tangible outcome of the TextLink network: a joint online database project displaying and linking existing and newly-created lexicons of discourse connectives in multiple languages. We discuss the definition and demarcation of the class of connectives that should be included in such a resource, and present the syntactic, semantic/pragmatic, and lexicographic information we collected. Further, the technical implementation of the database and the search functionality are presented. We discuss how the multilingual integration of several connective lexicons provides added value for linguistic researchers and other users interested in connectives, by allowing crosslinguistic comparison and a direct linking between discourse relational devices in different languages. Finally, we provide pointers for possible future extensions both in breadth (i.e., by adding lexicons for additional languages) and depth (by extending the information provided for each connective item and by strengthening the crosslinguistic links).Nous présentons dans cet article un résultat tangible du réseau TextLink : un projet conjoint de base de données en ligne, qui montre et relie des lexiques, aussi bien existants que créés récemment, de connecteurs discursifs dans plusieurs langues. Nous commençons par considérer la définition et la délimitation de la classe des connecteurs qui devraient être inclus dans une telle ressource, et nous présentons l’information syntaxique, sémantico-pragmatique et lexicographique que nous avons recueillie. D’autre part, l’implémentation technique de cette base de données et les fonctionnalités de recherche qu’elle permet sont aussi décrites. Nous discutons de quelle manière l’intégration multilingue de plusieurs lexiques de connecteurs apporte une valeur ajoutée aux chercheurs en linguistique et aux autres utilisateurs qui s’intéressent aux connecteurs, en permettant de comparer plusieurs langues et de relier directement les connecteurs dans différentes langues. Pour finir, nous donnons des indications quant à une possible extension future en termes d’ampleur (par exemple, en ajoutant des lexiques pour de nouvelles langues) et de profondeur (en augmentant l’information qui est donnée pour chaque connecteur et en renforçant les liens entre lexiques)

    Primary and secondary discourse connectives: definitions and lexicons

    Get PDF
    Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation)

    Inducing discourse marker inventories from lexical knowledge graphs

    Get PDF
    Discourse marker inventories are important tools for the development of both discourse parsers and corpora with discourse annotations. In this paper we explore the potential of massively multilingual lexical knowledge graphs to induce multilingual discourse marker lexicons using concept propagation methods as previously developed in the context of translation inference across dictionaries. Given one or multiple source languages with discourse marker inventories that discourse relations as senses of potential discourse markers, as well as a large number of bilingual dictionaries that link them – directly or indirectly – with the target language, we specifically study to what extent discourse marker induction can benefit from the integration of information from different sources, the impact of sense granularity and what limiting factors may need to be considered. Our study uses discourse marker inventories from nine European languages normalized against the discourse relation inventory of the Penn Discourse Treebank (PDTB), as well as three collections of machine-readable dictionaries with different characteristics, so that the interplay of a large number of factors can be studied

    A Lexicon of Discourse Markers for Portuguese – LDM-PT

    Get PDF
    We present LDM-PT, a lexicon of discourse markers for European Portuguese, composed of 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al., 2010). For each discourse marker in the lexicon, there is information regarding its type, category, mood and tense restrictions over the sentence it introduces, rhetorical sense, following the PDTB 3.0 sense hierarchy (Webber et al., 2016), as well as a link to an English near-synonym and a corpus example. The lexicon is compiled in a single excel spread sheet that is later converted to an XML scheme compatible with the DiMLex format (Stede, 2002). We give a detailed description of the contents and format of the lexicon, and discuss possible applications of this resource for discourse studies and discourse processing tools for Portuguese.info:eu-repo/semantics/publishedVersio

    Learning Explicit and Implicit Arabic Discourse Relations.

    Get PDF
    We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%

    Sentiment polarity shifters : creating lexical resources through manual annotation and bootstrapped machine learning

    Get PDF
    Alleviating pain is good and abandoning hope is bad. We instinctively understand how words like "alleviate" and "abandon" affect the polarity of a phrase, inverting or weakening it. When these words are content words, such as verbs, nouns and adjectives, we refer to them as polarity shifters. Shifters are a frequent occurrence in human language and an important part of successfully modeling negation in sentiment analysis; yet research on negation modeling has focussed almost exclusively on a small handful of closed class negation words, such as "not", "no" and "without. A major reason for this is that shifters are far more lexically diverse than negation words, but no resources exist to help identify them. We seek to remedy this lack of shifter resources. Our most central step towards this is the creation of a large lexicon of polarity shifters that covers verbs, nouns and adjectives. To reduce the prohibitive cost of such a large annotation task, we develop a bootstrapping approach that combines automatic classification with human verification. This ensures the high quality of our lexicon while reducing annotation cost by over 70%. In designing the bootstrap classifier we develop a variety of features which use both existing semantic resources and linguistically informed text patterns. In addition we investigate how knowledge about polarity shifters might be shared across different parts of speech, highlighting both the potential and limitations of such an approach. The applicability of our bootstrapping approach extends beyond the creation of a single resource. We show how it can further be used to introduce polarity shifter resources for other languages. Through the example case of German we show that all our features are transferable to other languages. Keeping in mind the requirements of under-resourced languages, we also explore how well a classifier would do when relying only on data- but not resource-driven features. We also introduce ways to use cross-lingual information, leveraging the shifter resources we previously created for other languages. Apart from the general question of which words can be polarity shifters, we also explore a number of other factors. One of these is the matter of shifting directions, which indicates whether a shifter affects positive polarities, negative polarities or whether it can shift in either direction. Using a supervised classifier we add shifting direction information to our bootstrapped lexicon. For other aspects of polarity shifting, manual annotation is preferable to automatic classification. Not every word that can cause polarity shifting does so for every of its word senses. As word sense disambiguation technology is not robust enough to allow the automatic handling of such nuances, we manually create a complete sense-level annotation of verbal polarity shifters. To verify the usefulness of the lexica which we create, we provide an extrinsic evaluation in which we apply them to a sentiment analysis task. In this task the different lexica are not only compared amongst each other, but also against a state-of-the-art compositional polarity neural network classifier that has been shown to be able to implicitly learn the negating effect of negation words from a training corpus. However, we find that the same is not true for the far more lexically diverse polarity shifters. Instead, the use of the explicit knowledge provided by our shifter lexica brings clear gains in performance.Deutsche Forschungsgesellschaf
    corecore