137 research outputs found
Toward a unifying model for Opinion, Sentiment and Emotion information extraction
International audienceThis paper presents a logical formalization of a set 20 semantic categories related to opinion, emotion and sentiment. Our formalization is based on the BDI model (Belief, Desire and Intetion) and constitues a first step toward a unifying model for subjective information extraction. The separability of the subjective classes that we propose was assessed both formally and on two subjective reference corpora
The NLP4NLP Corpus (I): 50 Years of Publication, Collaboration and Citation in Speech and Language Processing
This paper introduces the NLP4NLP corpus, which contains articles published in 34 major conferences and journals in the field of speech and natural language processing over a period of 50 years (1965â2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing ~270 million words. Most of these publications are in English, some are in French, German, or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze this corpus several tools have been used or developed. Many of them use Natural Language Processing methods that have been published in the corpus, hence its name. The paper presents the corpus and some findings regarding its content (evolution over time of the number of articles and authors, collaborations between authors, citations between papers and authors), in the context of a global or comparative analysis between sources. Numerous manual corrections were necessary, which demonstrated the importance of establishing standards for uniquely identifying authors, articles, or publications
AppFM, une plate-forme de gestion de modules de TAL
International audienceAppFM is a tool between a NLP pipeline framework and a system service management. It allows integration of applications with complex dependencies into functional modules workflows of convenient usage within multiples interfaces.AppFM est un outil à mi-chemin entre un environnement de création de chaßnes modulaires de TAL et un gestionnaire de services systÚmes. Il permet l'intégration d'applications ayant des dépendances complexes en des chaßnes de traitements réutilisables facilement par le biais de multiples interfaces
NLP4NLP+5: The Deep (R)evolution in Speech and Language Processing
This paper aims at analyzing the changes in the fields of speech and natural language processing over the recent past 5 years (2016â2020). It is in continuation of a series of two papers that we published in 2019 on the analysis of the NLP4NLP corpus, which contained articles published in 34 major conferences and journals in the field of speech and natural language processing, over a period of 50 years (1965â2015), and analyzed with the methods developed in the field of NLP, hence its name. The extended NLP4NLP+5 corpus now covers 55 years, comprising close to 90,000 documents [+30% compared with NLP4NLP: as many articles have been published in the single year 2020 than over the first 25 years (1965â1989)], 67,000 authors (+40%), 590,000 references (+80%), and approximately 380 million words (+40%). These analyses are conducted globally or comparatively among sources and also with the general scientific literature, with a focus on the past 5 years. It concludes in identifying profound changes in research topics as well as in the emergence of a new generation of authors and the appearance of new publications around artificial intelligence, neural networks, machine learning, and word embedding
Natural Language Processing for Cognitive Analysis of Emotions
International audienc
Natural Language Processing for Cognitive Analysis of Emotions
International audienceEmotion analysis in texts suffers from two major limitations: annotated gold-standard corpora are mostly small and homogeneous, and emotion identification is often simplified as a sentence-level classification problem. To address these issues, we introduce a new annotation scheme for exploring emotions and their causes, along with a new French dataset composed of autobiographical accounts of an emotional scene. The texts were collected by applying the Cognitive Analysis of Emotions developed by A. Finkel to help people improve on their emotion management. The method requires the manual analysis of an emotional event by a coach trained in Cognitive Analysis. We present a rule-based approach to automatically annotate emotions and their semantic roles (e.g. emotion causes) to facilitate the identification of relevant aspects by the coach. We investigate future directions for emotion analysis using graph structures
FRASQUES, le systĂšme du groupe LIR
National audienceno abstrac
Actes de la conférence Traitement Automatique de la Langue Naturelle, TALN 2018: Volume 2 : Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT
International audienc
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
User-generated data sources have gained significance in uncovering Adverse
Drug Reactions (ADRs), with an increasing number of discussions occurring in
the digital world. However, the existing clinical corpora predominantly revolve
around scientific articles in English. This work presents a multilingual corpus
of texts concerning ADRs gathered from diverse sources, including patient fora,
social media, and clinical reports in German, French, and Japanese. Our corpus
contains annotations covering 12 entity types, four attribute types, and 13
relation types. It contributes to the development of real-world multilingual
language models for healthcare. We provide statistics to highlight certain
challenges associated with the corpus and conduct preliminary experiments
resulting in strong baselines for extracting entities and relations between
these entities, both within and across languages.Comment: Accepted at LREC-COLING 202
- âŠ