3 research outputs found

    Methodical evaluation of Arabic word embeddings

    Get PDF
    Many unsupervised learning techniques have been proposed to obtain meaningful representations of words from text. In this study, we evaluate these various techniques when used to generate Arabic word embeddings. We first build a benchmark for the Arabic language that can be utilized to perform intrinsic evaluation of different word embeddings. We then perform additional extrinsic evaluations of the embeddings based on two NLP tasks. 2017 Association for Computational Linguistics.This work was made possible by NPRP 6-716-1-138 grant from the Qatar National Research Fund (a member of Qatar Foundation). The statementsScopu

    Evaluation et amélioration des capacités de raisonnement des Modèles de Langage

    No full text
    This thesis focuses on evaluating and improving the reasoning abilities of Smaller Language Models (SLMs) and Large Language Models (LLMs). It explores SLMs’ performance on complex tasks and their limitations with simpler ones. This thesis introduces LogiTorch, a Python library that facilitates the training of models on various reasoning tasks with minimal coding.It also presents TINA, a negated data augmentation technique that improves SLMs’ robustness to Negation in textual entailment tasks. Further, this thesis explores LLMs’ capabilities through MAFALDA, a new benchmark for identifying and classifying reasoning fallacies, proposing a new annotation scheme and evaluation metric that considers subjectivity in reasoning. The findings indicate that humans outperform SLMs and LLMs in this reasoning task. We propose several research directions that merit further investigation, such as investigating Neuro-symbolic AI and improving the reasoning abilities of low-resource LLMs.Cette thèse examine les capacités de raisonnement des Petits Modèles de Langage (SLMs) et Grands Modèles de Langage (LLMs) et expose leurs limites. Elle présente LogiTorch, une bibliothèque Python facilitant l’entraînement de modèles sur diverses tâches de raisonnement. La thèse inclut également TINA, une technique d’augmentation de données qui renforce la robustesse des SLMs face à la négation dans les tâches d’implication textuelle. De plus, la thèse explore les capacités des LLMs avec MAFALDA, un nouveau benchmark pour la classification des sophismes, intégrant une métrique d’évaluation quiconsidère la subjectivité. Les résultats montrent que les humains surpassent les modèles dans cette tâche de raisonnement. Nous proposons plusieurs directions de recherche qui méritent une investigation plus approfondie, telles que l’exploration de l’IA Neurosymbolique et l’amélioration des capacités de raisonnement des LLMs à faibles ressources

    MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification

    No full text
    International audienceWe introduce MAFALDA, a benchmark for fallacy classification that merges and unites previous fallacy datasets. It comes with a taxonomy that aligns, refines, and unifies existing classifications of fallacies. We further provide a manual annotation of a part of the dataset together with manual explanations for each annotation. We propose a new annotation scheme tailored for subjective NLP tasks, and a new evaluation method designed to handle subjectivity. We then evaluate several language models under a zero-shot learning setting and human performances on MAFALDA to assess their capability to detect and classify fallacies
    corecore