51 research outputs found
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
Neural Greedy Constituent Parsing with Dynamic Oracles
International audienceDynamic oracle training has shown substantial improvements for dependency parsing in various settings, but has not been explored for constituent parsing. The present article introduces a dynamic oracle for transition-based constituent parsing. Experiments on the 9 languages of the SPMRL dataset show that a neural greedy parser with morphological features , trained with a dynamic oracle, leads to accuracies comparable with the best non-reranking and non-ensemble parsers
Prédiction structurée pour l’analyse syntaxique en constituants par transitions : modèles denses et modèles creux
International audienceL’article présente une méthode d’analyse syntaxique en constituants par transitions qui se fonde sur une méthode de pondération des analyses par apprentissage profond. Celle-ci est comparée à une méthode de pondération par perceptron structuré, vue comme plus classique. Nous introduisons tout d’abord un analyseur syntaxique pondéré par un réseau de neurones local et glouton qui s’appuie sur des plongements. Ensuite nous présentons son extension vers un modèle global et à recherche par faisceau. La comparaison avec un modèle d’analyse de la famille perceptron global et en faisceau permet de mettre en évidence les propriétés étonnamment bonnes du modèle neuronal à recherche gloutonne
Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle
We introduce a novel transition system for discontinuous constituency
parsing. Instead of storing subtrees in a stack --i.e. a data structure with
linear-time sequential access-- the proposed system uses a set of parsing
items, with constant-time random access. This change makes it possible to
construct any discontinuous constituency tree in exactly transitions
for a sentence of length . At each parsing step, the parser considers every
item in the set to be combined with a focus item and to construct a new
constituent in a bottom-up fashion. The parsing strategy is based on the
assumption that most syntactic structures can be parsed incrementally and that
the set --the memory of the parser-- remains reasonably small on average.
Moreover, we introduce a provably correct dynamic oracle for the new transition
system, and present the first experiments in discontinuous constituency parsing
using a dynamic oracle. Our parser obtains state-of-the-art results on three
English and German discontinuous treebanks.Comment: Accepted for publication at NAACL 2019; 14 page
Prédiction structurée pour l’analyse syntaxique en constituants par transitions : modèles denses et modèles creux
International audienceL’article présente une méthode d’analyse syntaxique en constituants par transitions qui se fonde sur une méthode de pondération des analyses par apprentissage profond. Celle-ci est comparée à une méthode de pondération par perceptron structuré, vue comme plus classique. Nous introduisons tout d’abord un analyseur syntaxique pondéré par un réseau de neurones local et glouton qui s’appuie sur des plongements. Ensuite nous présentons son extension vers un modèle global et à recherche par faisceau. La comparaison avec un modèle d’analyse de la famille perceptron global et en faisceau permet de mettre en évidence les propriétés étonnamment bonnes du modèle neuronal à recherche gloutonne
On Detecting Policy-Related Political Ads: An Exploratory Analysis of Meta Ads in 2022 French Election
Online political advertising has become the cornerstone of political
campaigns. The budget spent solely on political advertising in the U.S. has
increased by more than 100% from \$700 million during the 2017-2018 U.S.
election cycle to \$1.6 billion during the 2020 U.S. presidential elections.
Naturally, the capacity offered by online platforms to micro-target ads with
political content has been worrying lawmakers, journalists, and online
platforms, especially after the 2016 U.S. presidential election, where
Cambridge Analytica has targeted voters with political ads congruent with their
personality
To curb such risks, both online platforms and regulators (through the DSA act
proposed by the European Commission) have agreed that researchers, journalists,
and civil society need to be able to scrutinize the political ads running on
large online platforms. Consequently, online platforms such as Meta and Google
have implemented Ad Libraries that contain information about all political ads
running on their platforms. This is the first step on a long path. Due to the
volume of available data, it is impossible to go through these ads manually,
and we now need automated methods and tools to assist in the scrutiny of
political ads.
In this paper, we focus on political ads that are related to policy.
Understanding which policies politicians or organizations promote and to whom
is essential in determining dishonest representations. This paper proposes
automated methods based on pre-trained models to classify ads in 14 main policy
groups identified by the Comparative Agenda Project (CAP). We discuss several
inherent challenges that arise. Finally, we analyze policy-related ads featured
on Meta platforms during the 2022 French presidential elections period.Comment: Proceedings of the ACM Web Conference 2023 (WWW '23), May 1--5, 2023,
Austin, TX, US
Privacy-preserving Neural Representations of Text
This article deals with adversarial attacks towards deep learning systems for
Natural Language Processing (NLP), in the context of privacy protection. We
study a specific type of attack: an attacker eavesdrops on the hidden
representations of a neural text classifier and tries to recover information
about the input text. Such scenario may arise in situations when the
computation of a neural network is shared across multiple devices, e.g. some
hidden representation is computed by a user's device and sent to a cloud-based
model. We measure the privacy of a hidden representation by the ability of an
attacker to predict accurately specific private information from it and
characterize the tradeoff between the privacy and the utility of neural
representations. Finally, we propose several defense methods based on modified
training objectives and show that they improve the privacy of neural
representations.Comment: EMNLP 201
- …