16 research outputs found
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?
Recent advances in natural language processing (NLP) have led to the
development of large language models (LLMs) such as ChatGPT. This paper
proposes a methodology for developing and evaluating ChatGPT detectors for
French text, with a focus on investigating their robustness on out-of-domain
data and against common attack schemes. The proposed method involves
translating an English dataset into French and training a classifier on the
translated data. Results show that the detectors can effectively detect
ChatGPT-generated text, with a degree of robustness against basic attack
techniques in in-domain settings. However, vulnerabilities are evident in
out-of-domain contexts, highlighting the challenge of detecting adversarial
text. The study emphasizes caution when applying in-domain testing results to a
wider variety of content. We provide our translated datasets and models as
open-source resources. https://gitlab.inria.fr/wantoun/robust-chatgpt-detectionComment: Accepted to TALN 202
Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel
International audienceLexical incompleteness is a recurring problem when dealing with natural language and its variability. It seems indeed necessary today to regularly validate and extend lexica used by tools processing large amounts of textual data. This is even more true when processing real-time text flows. In this context, our paper introduces techniques aimed at addressing words unknown to a lexicon. We first study neology (from a theoretic and corpus-based point of view) and describe the modules we have developed for detecting them and inferring information about them (lemma, category, inflectional class). We show that we are able, using various modules for analyzing derived and compound neologisms, to generate lexical entries candidates in real-time and with a good precision.L'incomplétude lexicale est un problème récurrent lorsque l'on cherche à traiter le langage naturel dans sa variabilité. Effectivement, il semble aujourd'hui nécessaire de vérifier et compléter régulièrement les lexiques utilisés par les applications qui analysent d'importants volumes de textes. Ceci est plus particulièrement vrai pour les flux textuels en temps réel. Dans ce contexte, notre article présente des solutions dédiées au traitement des mots inconnus d'un lexique. Nous faisons une étude des néologismes (linguistique et sur corpus) et détaillons la mise en œuvre de modules d'analyse dédiés à leur détection et à l'inférence d'informations (forme de citation, catégorie et classe flexionnelle) à leur sujet. Nous y montrons que nous sommes en mesure, grâce notamment à des modules d'analyse des dérivés et des composés, de proposer en temps réel des entrées pour ajout aux lexiques avec une bonne précision
Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel
International audienceLexical incompleteness is a recurring problem when dealing with natural language and its variability. It seems indeed necessary today to regularly validate and extend lexica used by tools processing large amounts of textual data. This is even more true when processing real-time text flows. In this context, our paper introduces techniques aimed at addressing words unknown to a lexicon. We first study neology (from a theoretic and corpus-based point of view) and describe the modules we have developed for detecting them and inferring information about them (lemma, category, inflectional class). We show that we are able, using various modules for analyzing derived and compound neologisms, to generate lexical entries candidates in real-time and with a good precision.L'incomplétude lexicale est un problème récurrent lorsque l'on cherche à traiter le langage naturel dans sa variabilité. Effectivement, il semble aujourd'hui nécessaire de vérifier et compléter régulièrement les lexiques utilisés par les applications qui analysent d'importants volumes de textes. Ceci est plus particulièrement vrai pour les flux textuels en temps réel. Dans ce contexte, notre article présente des solutions dédiées au traitement des mots inconnus d'un lexique. Nous faisons une étude des néologismes (linguistique et sur corpus) et détaillons la mise en œuvre de modules d'analyse dédiés à leur détection et à l'inférence d'informations (forme de citation, catégorie et classe flexionnelle) à leur sujet. Nous y montrons que nous sommes en mesure, grâce notamment à des modules d'analyse des dérivés et des composés, de proposer en temps réel des entrées pour ajout aux lexiques avec une bonne précision
The French Social Media Bank: a Treebank of Noisy User Generated Content
International audienceIn recent years, statistical parsers have reached high performance levels on well-edited texts. Domain adaptation techniques have improved parsing results on text genres differing from the journalistic data most parsers are trained on. However, such corpora usually comply with standard linguistic, spelling and typographic conventions. In the meantime, the emergence of Web 2.0 communication media has caused the apparition of new types of online textual data. Although valuable, e.g., in terms of data mining and sentiment analysis, such user-generated content rarely complies with standard conventions: they are noisy. This prevents most NLP tools, especially treebank based parsers, from performing well on such data. For this reason, we have developed the French Social Media Bank, the first user-generated content treebank for French, a morphologically rich language (MRL). The first release of this resource contains 1,700 sentences from various Web 2.0 sources, including data specifically chosen for their high noisiness. We describe here how we created this treebank and expose the methodology we used for fully annotating it. We also provide baseline POS tagging and statistical constituency parsing results, which are lower by far than usual results on edited texts. This highlights the high difficulty of automatically processing such noisy data in a MRL
KYSTE PLEURO-PERICARDIQUE (A PROPOS D'UN CAS TRAITE PAR DRAINAGES ITERATIFS)
CLERMONT FD-BCIU-Santé (631132104) / SudocPARIS-BIUM (751062103) / SudocSudocFranceF
Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect?
National audienceRecent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?
<p>Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources.</p>
L’annonce d’un décès à la famille : vécu des internes de médecine générale et perspectives d’amélioration
Contexte : Les internes sont confrontés rapidement aux annonces de décès dans leur cursus, alors qu’ils n’y sont pas préparés. But : Explorer le vécu et les attentes des internes de médecine générale face à l’annonce de décès à la famille. Méthodes : Étude qualitative par entretiens individuels semi-dirigés auprès d’internes de médecine générale en Auvergne. Une double analyse a été menée selon la méthode de théorisation ancrée. Résultats : 18 entretiens ont été réalisés. La survenue d’un décès confronte l’interne à la mort et aux limites de la médecine, générant des sentiments d’impuissance, d’échec et de culpabilité. Lors de l’annonce, les internes se sentent seuls, leur position entre inexpérience et responsabilité étant inconfortable. Ils sont rarement satisfaits de leur annonce, du fait du manque de disponibilité liée à leur charge de travail. L’annonce les confronte à la souffrance des familles et suscite des sentiments de tristesse, gène, doute, d’où l’émergence de mécanismes de défense. Leur vécu est influencé par les circonstances de décès, mais aussi par les compétences relationnelles et professionnelles acquises par l’expérience et par la formation. La relecture de l’événement, le modèle de rôle et la supervision de l’étudiant sont des méthodes pédagogiques facilitant l’apprentissage et le vécu de l’interne. Conclusion : Le décès confronte les internes à la mort et brise l’illusion de la toute-puissance de la médecine, à l’origine de sentiments d’impuissance et de culpabilité. Ils expriment le besoin de repères théoriques, d’un meilleur encadrement en stage ainsi que d’une relecture de l’événement
The placenta: phenotypic and epigenetic modifications induced by Assisted Reproductive Technologies throughout pregnancy.
International audienceToday, there is growing interest in the potential epigenetic risk related to assisted reproductive technologies (ART). Much evidence in the literature supports the hypothesis that adverse pregnancy outcomes linked to ART are associated with abnormal trophoblastic invasion. The aim of this review is to investigate the relationship between epigenetic dysregulation caused by ART and subsequent placental response. The dialogue between the endometrium and the embryo is a crucial step to achieve successful trophoblastic invasion, thus ensuring a non-complicated pregnancy and healthy offspring. However, as described in this review, ART could impair both actors involved in this dialogue. First, ART may induce epigenetic defects in the conceptus by modifying the embryo environment. Second, as a result of hormone treatments, ART may impair endometrial receptivity. In some cases, it results in embryonic growth arrest but, when the development of the embryo continues, the placenta could bring adaptive responses throughout pregnancy. Amongst the different mechanisms, epigenetics, especially thanks to a finely tuned network of imprinted genes stimulated by foetal signals, may modify nutrient transfer, placental growth and vascularization. If these coping mechanisms are overwhelmed, improper maternal-foetal exchanges occur, potentially leading to adverse pregnancy outcomes such as abortion, preeclampsia or intra-uterine growth restriction. But in most cases, successful placental adaptation enables normal progress of the pregnancy. Nevertheless, the risks induced by these modifications during pregnancy are not fully understood. Metabolic diseases later in life could be exacerbated through the memory of epigenetic adaptation mechanisms established during pregnancy. Thus, more research is still needed to better understand abnormal interactions between the embryo and the milieu in artificial conditions. As trophectoderm cells are in direct contact with the environment, they deserve to be studied in more detail. The ultimate goal of these studies will be to render ART protocols safer. Optimization of the environment will be the key to improving the dialogue between the endometrium and embryo, so as to ensure that placentation after ART is similar to that following natural conception