48 research outputs found

    The Impact of Word Representations on Sequential Neural MWE Identification

    Get PDF
    International audienceRecent initiatives such as the PARSEME shared task have allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural verbal MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based em-beddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lem-mas, depending on the morphological complexity of the language

    A comparative study of different features for efficient automatic hate speech detection

    Get PDF
    International audienceCommonly, Hate Speech (HS) is defined as any communication that disparages a person or agroup on the basis of some characteristic (race, colour, ethnicity, gender, sexual orientation, na-tionality, etc. (Nockeby, 2000)). Due to the massive activities of user-generator on social networks(around 500 million tweets per day) Hate Speech is continuously increasing on the web.Recent initiatives, such as SemEval2019 shared task 5 Hateval2019 (Basile et al., 2019) contri-bute to the development of automatic hate speech detection systems (HSD) by making availableannotated hateful corpus. We focus our research on automatic classification of hateful tweets,which are the first sub-task of Hateval2019. The best Hateval2019 HSD system was FERMI (In-durthi et al., 2019) with 65.1 % macro-F1 score on the test corpus. This system used sentenceembeddings, Universal Sentence Encoder (USE) (Cer et al., 2018) as input of a Support VectorMachine classifier.In this article, we study the impact of different features on an HSD system. We use deep neu-ral network (DNN) based classifier with USE. We investigate the word level features, such aslexicon of hateful words (HFW), Part of Speech (POS), uppercase letters (UP), punctuationmarks (PUNCT), the ratio of the number of times a word appears in hateful tweets comparedto the total number of times that word appears (RatioHW) ; and the emojis (EMO). We think thatthese features are relevant because they carry feelings. For instance, cases (UP) and punctuations(PUNCT) can carry the intonation of the tweets and can be used to express a hateful content. ForHFW features, we tag each word of tweets as hateful or not using the Hatebase lexicon (Hate-base.org) and we associate a binary value to each word. For POS features, we use twpipe (Liu etal., 2018) for tagging the words and this information is coded as an one-hot vector. For emojis,we generate an embedding vector using emoji2vec tools (Eisner et al., 2016). The input of ourneural network consists of the USE vector and our additional features. We used convolutionalneural networks (CNN) as binary classifier. We performed the experiments on the HateEval2019corpus to study the influence of each proposed feature. Our baseline system without proposedfeatures achieves 65.7% of macro-F1 score on the test corpus. Surprisingly, HFW degrades thesystem performance and decreases the macro-F1 by 14 points compared to the baseline. Thiscan be due to the fact that some words are hateful only in a particular context. UP, RatioHWand PUNCT slightly degrade the baseline system. The POS features do not change the baselinesystem result and so are probably not correlated to the hate speech. The best result is obtainedusing EMO features with 66.0% of macro-F1. EMOs are largely used to transmit emotions. Inour system,they are modeled by a specific embedding vector. USE does not take into account theemojis. Therefore, EMOs give additional information to USE about the hateful content of tweets

    Lower Respiratory Tract Infection and Short-Term Outcome in Patients With Acute Respiratory Distress Syndrome

    Get PDF
    To assess whether ventilator-associated lower respiratory tract infections (VA-LRTIs) are associated with mortality in critically ill patients with acute respiratory distress syndrome (ARDS). Post hoc analysis of prospective cohort study including mechanically ventilated patients from a multicenter prospective observational study (TAVeM study); VA-LRTI was defined as either ventilator-associated tracheobronchitis (VAT) or ventilator-associated pneumonia (VAP) based on clinical criteria and microbiological confirmation. Association between intensive care unit (ICU) mortality in patients having ARDS with and without VA-LRTI was assessed through logistic regression controlling for relevant confounders. Association between VA-LRTI and duration of mechanical ventilation and ICU stay was assessed through competing risk analysis. Contribution of VA-LRTI to a mortality model over time was assessed through sequential random forest models. The cohort included 2960 patients of which 524 fulfilled criteria for ARDS; 21% had VA-LRTI (VAT = 10.3% and VAP = 10.7%). After controlling for illness severity and baseline health status, we could not find an association between VA-LRTI and ICU mortality (odds ratio: 1.07; 95% confidence interval: 0.62-1.83; P =.796); VA-LRTI was also not associated with prolonged ICU length of stay or duration of mechanical ventilation. The relative contribution of VA-LRTI to the random forest mortality model remained constant during time. The attributable VA-LRTI mortality for ARDS was higher than the attributable mortality for VA-LRTI alone. After controlling for relevant confounders, we could not find an association between occurrence of VA-LRTI and ICU mortality in patients with ARDS

    SN 2009E: a faint clone of SN 1987A

    Get PDF
    In this paper we investigate the properties of SN 2009E, which exploded in a relatively nearby spiral galaxy (NGC 4141) and that is probably the faintest 1987A-like supernova discovered so far. Spectroscopic observations which started about 2 months after the supernova explosion, highlight significant differences between SN 2009E and the prototypical SN 1987A. Modelling the data of SN 2009E allows us to constrain the explosion parameters and the properties of the progenitor star, and compare the inferred estimates with those available for the similar SNe 1987A and 1998A. The light curve of SN 2009E is less luminous than that of SN 1987A and the other members of this class, and the maximum light curve peak is reached at a slightly later epoch than in SN 1987A. Late-time photometric observations suggest that SN 2009E ejected about 0.04 solar masses of 56Ni, which is the smallest 56Ni mass in our sample of 1987A-like events. Modelling the observations with a radiation hydrodynamics code, we infer for SN 2009E a kinetic plus thermal energy of about 0.6 foe, an initial radius of ~7 x 10^12 cm and an ejected mass of ~19 solar masses. The photospheric spectra show a number of narrow (v~1800 km/s) metal lines, with unusually strong Ba II lines. The nebular spectrum displays narrow emission lines of H, Na I, [Ca II] and [O I], with the [O I] feature being relatively strong compared to the [Ca II] doublet. The overall spectroscopic evolution is reminiscent of that of the faint 56Ni-poor type II-plateau supernovae. This suggests that SN 2009E belongs to the low-luminosity, low 56Ni mass, low-energy tail in the distribution of the 1987A-like objects in the same manner as SN 1997D and similar events represent the faint tail in the distribution of physical properties for normal type II-plateau supernovae.Comment: 19 pages, 9 figures (+7 in appendix); accepted for publication in A&A on 3 November 201

    Current and emerging developments in subseasonal to decadal prediction

    Get PDF
    Weather and climate variations of subseasonal to decadal timescales can have enormous social, economic and environmental impacts, making skillful predictions on these timescales a valuable tool for decision makers. As such, there is a growing interest in the scientific, operational and applications communities in developing forecasts to improve our foreknowledge of extreme events. On subseasonal to seasonal (S2S) timescales, these include high-impact meteorological events such as tropical cyclones, extratropical storms, floods, droughts, and heat and cold waves. On seasonal to decadal (S2D) timescales, while the focus remains broadly similar (e.g., on precipitation, surface and upper ocean temperatures and their effects on the probabilities of high-impact meteorological events), understanding the roles of internal and externally-forced variability such as anthropogenic warming in forecasts also becomes important. The S2S and S2D communities share common scientific and technical challenges. These include forecast initialization and ensemble generation; initialization shock and drift; understanding the onset of model systematic errors; bias correct, calibration and forecast quality assessment; model resolution; atmosphere-ocean coupling; sources and expectations for predictability; and linking research, operational forecasting, and end user needs. In September 2018 a coordinated pair of international conferences, framed by the above challenges, was organized jointly by the World Climate Research Programme (WCRP) and the World Weather Research Prograame (WWRP). These conferences surveyed the state of S2S and S2D prediction, ongoing research, and future needs, providing an ideal basis for synthesizing current and emerging developments in these areas that promise to enhance future operational services. This article provides such a synthesis

    Improving Hate Speech Detection with Self-Attention Mechanism and Multi-Task Learning

    No full text
    International audienceHate speech detection is a challenging task of natural language processing. Recently, some works have focused on the use of multiword expressions for hate speech detection. In this paper, we propose to use an auxiliary task to improve hate speech detection: multiword expression identification. Our proposed system, based on multi-task with self-attention, outperforms an MWE-based features state-of-the-art system on four hate speech corpora

    Multiword Expression Features for Automatic Hate Speech Detection

    No full text
    International audienceThe task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1

    A comparative study of different state-of-the-art NLP models for efficient automatic hate speech detection

    No full text
    International audienceHate speech (HS) is legally punished in many countries. Manual moderation of hate messages on social networks is no longer possible due to the huge number of messages posted every day. Automatic methods are needed to remove harmful messages. In this article, we are interested on HS detection (HSD) in social media (Twitter). Hateful content is more than just keyword detection. Hate may be implied, tweets can be grammatically incorrect and the abbreviations and slangs may be numerous. In this condition, the HSD is a complex task.Recently, natural language processing (NLP) methods have been proposed for the detection of HS. In particular, systems based on deep neural networks (DNN) have been shown to have notable performance for HSD.In this paper, we study different state-of-the-art NLP models for the HSD task. Indeed, powerful new models based on transformers have recently emerged in the literature, such as BERT, HateBERT, BERTweet, SemBERT and USE (Universal Sentence Encoder). These models are trained on different generic corpora collected from various sources. These BERT-based models can be fine-tuned for a specific task. Some models have particularities. The BERT, HateBERT and BERTweet models can be used to extract features at words or sentence level. The SemBERT model models only word-level features and further incorporate explicit contextual semantics. The USE model generates sentence-level features. The HateBERT model is trained on a Reddit corpus with a high potential of hateful content. The BERTweet model is trained on more than 840M tweets. The goal of this article is to study the generalizability of these models on HSD in tweets and to investigate the impact of sentence-level and word-level features.We investigate different DNN structures for HSD using the transformer-based models. The impact of word and sentence-based methods was assessed. Our experiments were performed on two HS corpora extracted from Twitter: Founta and Davidson (Founta et al, 2018; Davidson et al, 2017). The best performances were obtained with the BERTweet model for the two corpora (77.3% and 78.5% macro-F1 on the Founta and Davidson test sets respectively). This work was performed in the context of the French-German ANR-DFG project M-PHASIS

    Veyn at PARSEME Shared Task 2018: Recurrent Neural Networks for VMWE Identification

    No full text
    International audienceThis paper describes the Veyn system, submitted to the closed track of the PARSEME Shared Task 2018 on automatic identification of verbal multiword expressions (VMWEs). Veyn is based on a sequence tagger using recurrent neural networks. We represent VMWEs using a variant of the begin-inside-outside encoding scheme combined with the VMWE category tag. In addition to the system description, we present development experiments to determine the best tagging scheme. Veyn is freely available, covers 19 languages, and was ranked ninth (MWE-based) and eight (Token-based) among 13 submissions, considering macro-averaged F1 across languages
    corecore