51 research outputs found

    Cross-lingual RST Discourse Parsing

    Get PDF
    Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page

    Self-Supervised and Controlled Multi-Document Opinion Summarization

    Full text link
    We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control. We propose a self-supervised setup that considers an individual document as a target summary for a set of similar documents. This setting makes training simpler than previous approaches by relying only on standard log-likelihood loss. We address the problem of hallucinations through the use of control codes, to steer the generation towards more coherent and relevant summaries.Finally, we extend the Transformer architecture to allow for multiple reviews as input. Our benchmarks on two datasets against graph-based and recent neural abstractive unsupervised models show that our proposed method generates summaries with a superior quality and relevance.This is confirmed in our human evaluation which focuses explicitly on the faithfulness of generated summaries We also provide an ablation study, which shows the importance of the control setup in controlling hallucinations and achieve high sentiment and topic alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi

    Neural Greedy Constituent Parsing with Dynamic Oracles

    Get PDF
    International audienceDynamic oracle training has shown substantial improvements for dependency parsing in various settings, but has not been explored for constituent parsing. The present article introduces a dynamic oracle for transition-based constituent parsing. Experiments on the 9 languages of the SPMRL dataset show that a neural greedy parser with morphological features , trained with a dynamic oracle, leads to accuracies comparable with the best non-reranking and non-ensemble parsers

    Prédiction structurée pour l’analyse syntaxique en constituants par transitions : modèles denses et modèles creux

    Get PDF
    International audienceL’article présente une méthode d’analyse syntaxique en constituants par transitions qui se fonde sur une méthode de pondération des analyses par apprentissage profond. Celle-ci est comparée à une méthode de pondération par perceptron structuré, vue comme plus classique. Nous introduisons tout d’abord un analyseur syntaxique pondéré par un réseau de neurones local et glouton qui s’appuie sur des plongements. Ensuite nous présentons son extension vers un modèle global et à recherche par faisceau. La comparaison avec un modèle d’analyse de la famille perceptron global et en faisceau permet de mettre en évidence les propriétés étonnamment bonnes du modèle neuronal à recherche gloutonne

    Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle

    Get PDF
    We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack --i.e. a data structure with linear-time sequential access-- the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly 4n−24n - 2 transitions for a sentence of length nn. At each parsing step, the parser considers every item in the set to be combined with a focus item and to construct a new constituent in a bottom-up fashion. The parsing strategy is based on the assumption that most syntactic structures can be parsed incrementally and that the set --the memory of the parser-- remains reasonably small on average. Moreover, we introduce a provably correct dynamic oracle for the new transition system, and present the first experiments in discontinuous constituency parsing using a dynamic oracle. Our parser obtains state-of-the-art results on three English and German discontinuous treebanks.Comment: Accepted for publication at NAACL 2019; 14 page

    Prédiction structurée pour l’analyse syntaxique en constituants par transitions : modèles denses et modèles creux

    Get PDF
    International audienceL’article présente une méthode d’analyse syntaxique en constituants par transitions qui se fonde sur une méthode de pondération des analyses par apprentissage profond. Celle-ci est comparée à une méthode de pondération par perceptron structuré, vue comme plus classique. Nous introduisons tout d’abord un analyseur syntaxique pondéré par un réseau de neurones local et glouton qui s’appuie sur des plongements. Ensuite nous présentons son extension vers un modèle global et à recherche par faisceau. La comparaison avec un modèle d’analyse de la famille perceptron global et en faisceau permet de mettre en évidence les propriétés étonnamment bonnes du modèle neuronal à recherche gloutonne

    On Detecting Policy-Related Political Ads: An Exploratory Analysis of Meta Ads in 2022 French Election

    Full text link
    Online political advertising has become the cornerstone of political campaigns. The budget spent solely on political advertising in the U.S. has increased by more than 100% from \$700 million during the 2017-2018 U.S. election cycle to \$1.6 billion during the 2020 U.S. presidential elections. Naturally, the capacity offered by online platforms to micro-target ads with political content has been worrying lawmakers, journalists, and online platforms, especially after the 2016 U.S. presidential election, where Cambridge Analytica has targeted voters with political ads congruent with their personality To curb such risks, both online platforms and regulators (through the DSA act proposed by the European Commission) have agreed that researchers, journalists, and civil society need to be able to scrutinize the political ads running on large online platforms. Consequently, online platforms such as Meta and Google have implemented Ad Libraries that contain information about all political ads running on their platforms. This is the first step on a long path. Due to the volume of available data, it is impossible to go through these ads manually, and we now need automated methods and tools to assist in the scrutiny of political ads. In this paper, we focus on political ads that are related to policy. Understanding which policies politicians or organizations promote and to whom is essential in determining dishonest representations. This paper proposes automated methods based on pre-trained models to classify ads in 14 main policy groups identified by the Comparative Agenda Project (CAP). We discuss several inherent challenges that arise. Finally, we analyze policy-related ads featured on Meta platforms during the 2022 French presidential elections period.Comment: Proceedings of the ACM Web Conference 2023 (WWW '23), May 1--5, 2023, Austin, TX, US

    Privacy-preserving Neural Representations of Text

    Get PDF
    This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural network is shared across multiple devices, e.g. some hidden representation is computed by a user's device and sent to a cloud-based model. We measure the privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it and characterize the tradeoff between the privacy and the utility of neural representations. Finally, we propose several defense methods based on modified training objectives and show that they improve the privacy of neural representations.Comment: EMNLP 201
    • …
    corecore