Search CORE

48 research outputs found

The Impact of Word Representations on Sequential Neural MWE Identification

Author: Damnati Geraldine
Ramisch Carlos
Zampieri Nicolas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

International audienceRecent initiatives such as the PARSEME shared task have allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural verbal MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based em-beddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lem-mas, depending on the morphological complexity of the language

Crossref

HAL AMU

A comparative study of different features for efficient automatic hate speech detection

Author: Fohr Dominique
Illina Irina
Zampieri Nicolas
Publication venue: HAL CCSD
Publication date: 27/06/2021
Field of study

International audienceCommonly, Hate Speech (HS) is defined as any communication that disparages a person or agroup on the basis of some characteristic (race, colour, ethnicity, gender, sexual orientation, na-tionality, etc. (Nockeby, 2000)). Due to the massive activities of user-generator on social networks(around 500 million tweets per day) Hate Speech is continuously increasing on the web.Recent initiatives, such as SemEval2019 shared task 5 Hateval2019 (Basile et al., 2019) contri-bute to the development of automatic hate speech detection systems (HSD) by making availableannotated hateful corpus. We focus our research on automatic classification of hateful tweets,which are the first sub-task of Hateval2019. The best Hateval2019 HSD system was FERMI (In-durthi et al., 2019) with 65.1 % macro-F1 score on the test corpus. This system used sentenceembeddings, Universal Sentence Encoder (USE) (Cer et al., 2018) as input of a Support VectorMachine classifier.In this article, we study the impact of different features on an HSD system. We use deep neu-ral network (DNN) based classifier with USE. We investigate the word level features, such aslexicon of hateful words (HFW), Part of Speech (POS), uppercase letters (UP), punctuationmarks (PUNCT), the ratio of the number of times a word appears in hateful tweets comparedto the total number of times that word appears (RatioHW) ; and the emojis (EMO). We think thatthese features are relevant because they carry feelings. For instance, cases (UP) and punctuations(PUNCT) can carry the intonation of the tweets and can be used to express a hateful content. ForHFW features, we tag each word of tweets as hateful or not using the Hatebase lexicon (Hate-base.org) and we associate a binary value to each word. For POS features, we use twpipe (Liu etal., 2018) for tagging the words and this information is coded as an one-hot vector. For emojis,we generate an embedding vector using emoji2vec tools (Eisner et al., 2016). The input of ourneural network consists of the USE vector and our additional features. We used convolutionalneural networks (CNN) as binary classifier. We performed the experiments on the HateEval2019corpus to study the influence of each proposed feature. Our baseline system without proposedfeatures achieves 65.7% of macro-F1 score on the test corpus. Surprisingly, HFW degrades thesystem performance and decreases the macro-F1 by 14 points compared to the baseline. Thiscan be due to the fact that some words are hateful only in a particular context. UP, RatioHWand PUNCT slightly degrade the baseline system. The POS features do not change the baselinesystem result and so are probably not correlated to the hate speech. The best result is obtainedusing EMO features with 66.0% of macro-F1. EMOs are largely used to transmit emotions. Inour system,they are modeled by a specific embedding vector. USE does not take into account theemojis. Therefore, EMOs give additional information to USE about the hateful content of tweets

INRIA a CCSD electronic archive server

Lower Respiratory Tract Infection and Short-Term Outcome in Patients With Acute Respiratory Distress Syndrome

Author: Almirall Jordi
Andrade Gomes José
Boussekey Nicolas
Martin-Loeches Ignacio
Molinos Elena
Nseir Saad
Póvoa Pedro
Ramirez Paula
Reignier Jean
Rodriguez Alejandro
Rouzé Anahita
Salluh Jorge I.
Socias Lorenzo
Valade Sandrine
Viana William N.
Zampieri Fernando G.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2018
Field of study

To assess whether ventilator-associated lower respiratory tract infections (VA-LRTIs) are associated with mortality in critically ill patients with acute respiratory distress syndrome (ARDS). Post hoc analysis of prospective cohort study including mechanically ventilated patients from a multicenter prospective observational study (TAVeM study); VA-LRTI was defined as either ventilator-associated tracheobronchitis (VAT) or ventilator-associated pneumonia (VAP) based on clinical criteria and microbiological confirmation. Association between intensive care unit (ICU) mortality in patients having ARDS with and without VA-LRTI was assessed through logistic regression controlling for relevant confounders. Association between VA-LRTI and duration of mechanical ventilation and ICU stay was assessed through competing risk analysis. Contribution of VA-LRTI to a mortality model over time was assessed through sequential random forest models. The cohort included 2960 patients of which 524 fulfilled criteria for ARDS; 21% had VA-LRTI (VAT = 10.3% and VAP = 10.7%). After controlling for illness severity and baseline health status, we could not find an association between VA-LRTI and ICU mortality (odds ratio: 1.07; 95% confidence interval: 0.62-1.83; P =.796); VA-LRTI was also not associated with prolonged ICU length of stay or duration of mechanical ventilation. The relative contribution of VA-LRTI to the random forest mortality model remained constant during time. The attributable VA-LRTI mortality for ARDS was higher than the attributable mortality for VA-LRTI alone. After controlling for relevant confounders, we could not find an association between occurrence of VA-LRTI and ICU mortality in patients with ARDS

Crossref

Diposit Digital de Documents de la UAB

SN 2009E: a faint clone of SN 1987A

Author: A. D. Cason
A. Harutyunyan
A. Pastorello
A. San Segundo Delgado
Aikman
Aldering
Alexeyev
Arcavi
Armstrong
Arnett
Benetti
Bionta
Blanc
Blondin
Blondin
Boles
Borkowski
Bresolin
Cardelli
Catchpole
Catchpole
Catchpole
Chen
Chugai
E. Cappellaro
E. Kankare
E. Prosperi
Elmhamdi
F. Bufano
F. Patat
F. Taddia
Filippenko
G. Cetrulo
G. M. Hurst
Gal Yam
Germany
Gilmozzi
H. Navasardyan
Hakobyan
Hamuy
Hamuy
Hirata
Inserra
J. Nicolas
J. Sollerman
Jarrett
Jones
Kennicutt
Kewley
Kleiser
Kozma
L. Germany
L. Zampieri
L.-G. Strolger
Landolt
Li
Li
M. Bachini
M. Ergon
M. L. Pumo
M. Stritzinger
M. Turatto
Madison
Madison
Maguire
Mazzali
Mazzali
Menzies
Modjaz
Navasardyan
Papenkova
Pastorello
Pastorello
Pastorello
Pastorello
Pastorello
Patat
Phillips
Phillips
Pilyugin
Pilyugin
Poznanski
Pumo
Pumo
Pun
S. Benetti
S. Howerton
S. Mattila
S. Taubenberger
Saha
Sakai
Sandage
Schlegel
Smartt
Sonneborn
T. Boles
Thielemann
Trondal
Tsujimoto
Tsvetkov
Tsvetkov
Tsvetkov
Tsvetkov
Turatto
Udalski
Udalski
Uomoto
Utrobin
Utrobin
van den Bergh
W. Wells
Whitelock
Whitelock
Williams
Woodings
Woosley
Woosley
Yamagata
Young
Young
Zampieri
Zaritsky
Publication venue: 'EDP Sciences'
Publication date: 10/11/2011
Field of study

In this paper we investigate the properties of SN 2009E, which exploded in a relatively nearby spiral galaxy (NGC 4141) and that is probably the faintest 1987A-like supernova discovered so far. Spectroscopic observations which started about 2 months after the supernova explosion, highlight significant differences between SN 2009E and the prototypical SN 1987A. Modelling the data of SN 2009E allows us to constrain the explosion parameters and the properties of the progenitor star, and compare the inferred estimates with those available for the similar SNe 1987A and 1998A. The light curve of SN 2009E is less luminous than that of SN 1987A and the other members of this class, and the maximum light curve peak is reached at a slightly later epoch than in SN 1987A. Late-time photometric observations suggest that SN 2009E ejected about 0.04 solar masses of 56Ni, which is the smallest 56Ni mass in our sample of 1987A-like events. Modelling the observations with a radiation hydrodynamics code, we infer for SN 2009E a kinetic plus thermal energy of about 0.6 foe, an initial radius of ~7 x 10^12 cm and an ejected mass of ~19 solar masses. The photospheric spectra show a number of narrow (v~1800 km/s) metal lines, with unusually strong Ba II lines. The nebular spectrum displays narrow emission lines of H, Na I, [Ca II] and [O I], with the [O I] feature being relatively strong compared to the [Ca II] doublet. The overall spectroscopic evolution is reminiscent of that of the faint 56Ni-poor type II-plateau supernovae. This suggests that SN 2009E belongs to the low-luminosity, low 56Ni mass, low-energy tail in the distribution of the 1987A-like objects in the same manner as SN 1997D and similar events represent the faint tail in the distribution of physical properties for normal type II-plateau supernovae.Comment: 19 pages, 9 figures (+7 in appendix); accepted for publication in A&A on 3 November 201

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Copenhagen University Research Information System

Victoria University Eprints Repository

MPG.PuRe

Current and emerging developments in subseasonal to decadal prediction

Author: Aaron Spring
Al-Yaari
Albert Ossó
Alessandri
Amy H. Butler
Andrew W. Robertson
Annika Reintges
Ardilouze
Ardilouze
Arun Kumar
Asmerom F. Beraki
Ayarzagüena
Baggett
Baggett
Balakrishnan Solaraju-Murali
Balmaseda
Balmaseda
Balsamo
Barnston
Batté
Bauer
Becker
Bellucci
Ben P. Kirtman
Benjamin W. Green
Bergman
Beverley
Bilodeau
Blockley
Boer
Booth
Borchert
Browne
Brune
Buckley
Butchart
Butler
Butler
Cai
Caio A. S. Coelho
Capotondi
Carly R. Tozer
Caron
Cassou
Chen
Chen
Chevallier
Chevallier
Chikamoto
Ching Ho Justin Ng
Christensen
Christoph Renkl
Christopher J. White
Clark
Coelho
Constantin Ardilouze
Cory Baggett
Cristiana Stan
Daniela I. V. Domeisen
de Andrade
DeFlorio
DelSole
DelSole
DeMott
Dias
DiNezio
DiNezio
Dirkson
Dirmeyer
Dirmeyer
Dirmeyer
Dirmeyer
Dirmeyer
Dobrynin
Domeisen
Domeisen
Domeisen
Dorigo
Doug M. Smith
Douglas E. Miller
Dunstone
Dunstone
Düsterhus
Eade
Emily J. Becker
Entekhabi
Felipe M. de Andrade
Ferranti
Flato
Francisco J. Doblas-Reyes
Frederic Vitart
Fujii
Funk
G. Cristina Recalde-Coronel
Garfinkel
Garfinkel
Garfinkel
Georgios Fragkoulidis
Gleixner
Gokhan Danabasoglu
Graham
Hackert
Hannah Attard
Hansen
Hao
Hazeleger
Henderson
Hudson
Ilyina
Infanti
Jain
Jennifer V. Mecking
Jeong
Jia
Johanna Baehr
Johnna M. Infanti
Judith Perlwitz
June-Yi Lee
Kadow
Kang
Kapnick
Karpechko
Katharina Isensee
Kathy Pegion
Kerr
Khodri
Kidston
Kim
Kim
Kim
Kim
Kirsten Mayer
Kirtman
Kirtman
Klaus Pankatz
Klemm
Kolstad
Koster
Koster
Koster
Koster
Kröger
Kushnir
Laura Ferranti
Lauriane Batté
Leandro B. Díaz
Lee
Lee
Lehner
Leutbecher
Li
Li
Li
Li
Li
Li
Li
Lim
Lim
Lim
Lin
Liu
Lledó
Long
Lovenduski
Lovenduski
Lowe
Lu
Luo
Luo
Magdalena A. Balmaseda
Maloney
Manzanas
Mariano S. Alvarez
Mariotti
Mariotti
Marotzke
Marshall
Matthias Tuma
Maycock
McKinnon
Meehl
Michael J. DeFlorio
Michel Rixen
Misios
Momme C. Hell
Monerie
Morcrette
Muhammad Azhar Ehsan
Mulholland
Muñoz-Sabater
Ménégoz
Müller
Neddermann
Nele Neddermann
Nicholas P. Klingaman
Nicolas Vigaud
Nie
Nishimoto
Nnamchi
Nowack
Orsolini
O’Reilly
O’Reilly
O’Reilly
Partha S. Bhattacharjee
Pasternack
Patricola
Paul A. Dirmeyer
Pegion
Penny
Penny
Penny
Polkova
Prodhomme
Prodhomme
Roberto Bilbao
Robertson
Robson
Roseanna McKay
Ruprich-Robert
Ruprich-Robert
Sam Grainger
Sanchez-Gomez
Santanello
Saravanan
Scaife
Scaife
Scaife
Scaife
Scaife
Schuster
Sheen
Shen
Shonk
Sigmond
Simon Peatman
Simpson
Smith
Smith
Sospedra-Alfonso
Sospedra-Alfonso
Stan
Stephen Yeager
Stephenson
Steven Woolnough
Stone
Strazzo
Strommen
Swingedouw
Taguchi
Takahashi
Takahito Kataoka
Takaya
Tatiana Ilynia
Teng
Tommasi
Tompkins
Toniazzo
Toure
Towler
Tripathi
Turco
Uotila
Vigaud
Vitart
Vitart
Vitart
Vitart
Vitart
Voldoire
Volpi
Wang
Wang
Wei
Weisheimer
Weisheimer
Weiss
White
Widlansky
William J. Merryfield
Williams
Wolfgang A. Müller
Woolnough
World Meteorological Organization
Xue
Y. Qiang Sun
Yang
Yang
Yang
Yeager
Yeager
Yeh
Yoo
Yoo
Yuan
Yuhei Takaya
Yun
Zampieri
Zhang
Zhang
Zhang
Zhao
Zhu
Publication venue: 'American Meteorological Society'
Publication date: 29/01/2020
Field of study

Weather and climate variations of subseasonal to decadal timescales can have enormous social, economic and environmental impacts, making skillful predictions on these timescales a valuable tool for decision makers. As such, there is a growing interest in the scientific, operational and applications communities in developing forecasts to improve our foreknowledge of extreme events. On subseasonal to seasonal (S2S) timescales, these include high-impact meteorological events such as tropical cyclones, extratropical storms, floods, droughts, and heat and cold waves. On seasonal to decadal (S2D) timescales, while the focus remains broadly similar (e.g., on precipitation, surface and upper ocean temperatures and their effects on the probabilities of high-impact meteorological events), understanding the roles of internal and externally-forced variability such as anthropogenic warming in forecasts also becomes important. The S2S and S2D communities share common scientific and technical challenges. These include forecast initialization and ensemble generation; initialization shock and drift; understanding the onset of model systematic errors; bias correct, calibration and forecast quality assessment; model resolution; atmosphere-ocean coupling; sources and expectations for predictability; and linking research, operational forecasting, and end user needs. In September 2018 a coordinated pair of international conferences, framed by the above challenges, was organized jointly by the World Climate Research Programme (WCRP) and the World Weather Research Prograame (WWRP). These conferences surveyed the state of S2S and S2D prediction, ongoing research, and future needs, providing an ideal basis for synthesizing current and emerging developments in these areas that promise to enhance future operational services. This article provides such a synthesis

OceanRep

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

University of Strathclyde Institutional Repository

University of Miami: Scholarship Miami

White Rose Research Online

MPG.PuRe

NERC Open Research Archive

Central Archive at the University of Reading

IBS Publications Repository

Crossref

UPCommons. Portal del coneixement obert de la UPC

CONICET Digital

JAMSTEC Repository

UPSpace at the University of Pretoria

Improving Hate Speech Detection with Self-Attention Mechanism and Multi-Task Learning

Author: Fohr Dominique
Illina Irina
Zampieri Nicolas
Publication venue: HAL CCSD
Publication date: 21/04/2023
Field of study

International audienceHate speech detection is a challenging task of natural language processing. Recently, some works have focused on the use of multiword expressions for hate speech detection. In this paper, we propose to use an auxiliary task to improve hate speech detection: multiword expression identification. Our proposed system, based on multi-task with self-attention, outperforms an MWE-based features state-of-the-art system on four hate speech corpora

INRIA a CCSD electronic archive server

Multiword Expression Features for Automatic Hate Speech Detection

Author: Fohr Dominique
Illina Irina
Zampieri Nicolas
Publication venue: HAL CCSD
Publication date: 23/06/2021
Field of study

International audienceThe task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1

INRIA a CCSD electronic archive server

A comparative study of different state-of-the-art NLP models for efficient automatic hate speech detection

Author: Fohr Dominique
Illina Irina
Zampieri Nicolas
Publication venue: HAL CCSD
Publication date: 16/09/2021
Field of study

International audienceHate speech (HS) is legally punished in many countries. Manual moderation of hate messages on social networks is no longer possible due to the huge number of messages posted every day. Automatic methods are needed to remove harmful messages. In this article, we are interested on HS detection (HSD) in social media (Twitter). Hateful content is more than just keyword detection. Hate may be implied, tweets can be grammatically incorrect and the abbreviations and slangs may be numerous. In this condition, the HSD is a complex task.Recently, natural language processing (NLP) methods have been proposed for the detection of HS. In particular, systems based on deep neural networks (DNN) have been shown to have notable performance for HSD.In this paper, we study different state-of-the-art NLP models for the HSD task. Indeed, powerful new models based on transformers have recently emerged in the literature, such as BERT, HateBERT, BERTweet, SemBERT and USE (Universal Sentence Encoder). These models are trained on different generic corpora collected from various sources. These BERT-based models can be fine-tuned for a specific task. Some models have particularities. The BERT, HateBERT and BERTweet models can be used to extract features at words or sentence level. The SemBERT model models only word-level features and further incorporate explicit contextual semantics. The USE model generates sentence-level features. The HateBERT model is trained on a Reddit corpus with a high potential of hateful content. The BERTweet model is trained on more than 840M tweets. The goal of this article is to study the generalizability of these models on HSD in tweets and to investigate the impact of sentence-level and word-level features.We investigate different DNN structures for HSD using the transformer-based models. The impact of word and sentence-based methods was assessed. Our experiments were performed on two HS corpora extracted from Twitter: Founta and Davidson (Founta et al, 2018; Davidson et al, 2017). The best performances were obtained with the BERTweet model for the two corpora (77.3% and 78.5% macro-F1 on the Founta and Davidson test sets respectively). This work was performed in the context of the French-German ANR-DFG project M-PHASIS

INRIA a CCSD electronic archive server

Veyn at PARSEME Shared Task 2018: Recurrent Neural Networks for VMWE Identification

Author: Favre Benoit
Ramisch Carlos
Scholivet Manon
Zampieri Nicolas
Publication venue: HAL CCSD
Publication date: 01/01/2018
Field of study

International audienceThis paper describes the Veyn system, submitted to the closed track of the PARSEME Shared Task 2018 on automatic identification of verbal multiword expressions (VMWEs). Veyn is based on a sequence tagger using recurrent neural networks. We represent VMWEs using a variant of the begin-inside-outside encoding scheme combined with the VMWE category tag. In addition to the system description, we present development experiments to determine the best tagging scheme. Veyn is freely available, covers 19 languages, and was ranked ninth (MWE-based) and eight (Token-based) among 13 submissions, considering macro-averaged F1 across languages

HAL AMU