4 research outputs found

    Collective moderation of hate, toxicity, and extremity in online discussions

    Full text link
    How can citizens moderate hate, toxicity, and extremism in online discourse? We analyze a large corpus of more than 130,000 discussions on German Twitter over the turbulent four years marked by the migrant crisis and political upheavals. With a help of human annotators, language models, machine learning classifiers, and longitudinal statistical analyses, we discern the dynamics of different dimensions of discourse. We find that expressing simple opinions, not necessarily supported by facts but also without insults, relates to the least hate, toxicity, and extremity of speech and speakers in subsequent discussions. Sarcasm also helps in achieving those outcomes, in particular in the presence of organized extreme groups. More constructive comments such as providing facts or exposing contradictions can backfire and attract more extremity. Mentioning either outgroups or ingroups is typically related to a deterioration of discourse in the long run. A pronounced emotional tone, either negative such as anger or fear, or positive such as enthusiasm and pride, also leads to worse outcomes. Going beyond one-shot analyses on smaller samples of discourse, our findings have implications for the successful management of online commons through collective civic moderation

    Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news

    Get PDF
    Annotated corpora are indispensable tools to train computational models in Natural Language Processing. However, in the case of more complex semantic annotation processes, it is a costly, arduous, and time-consuming task, resulting in a shortage of resources to train Machine Learning and Deep Learning algorithms. In consideration, this work proposes a methodology, based on the human-in-the-loop paradigm, for semi-automatic annotation of complex tasks. This methodology is applied in the construction of a reliability dataset of Spanish news so as to combat disinformation and fake news. We obtain a high quality resource by implementing the proposed methodology for semi-automatic annotation, increasing annotator efficacy and speed, with fewer examples. The methodology consists of three incremental phases and results in the construction of the RUN dataset. The annotation quality of the resource was evaluated through time-reduction (annotation time reduction of almost 64% with respect to the fully manual annotation), annotation quality (measuring consistency of annotation and inter-annotator agreement), and performance by training a model with RUN semi-automatic dataset (Accuracy 95% F1 95%), validating the suitability of the proposal.This research work is funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). It is also funded by Generalitat Valenciana, Spain through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177

    Knowledge-Infused Learning

    Get PDF
    In DARPA’s view of the three waves of AI, the first wave of AI, symbolic AI, focused on explicit knowledge. The second and current wave of AI is termed statistical AI. Deep learning techniques have been able to exploit large amounts of data and massive computational power to improve human levels of performance in narrowly defined tasks. Separately, knowledge graphs have emerged as a powerful tool to capture and exploit a variety of explicit knowledge to make algorithms better apprehend the content and enable the next generation of data processing, such as semantic search. After initial hesitancy about the scalability of the knowledge creation process, the last decade has seen significant growth in developing and applying knowledge, usually in the form of knowledge graphs. Examples range from the use of DBPedia in IBM’s Watson to Google Knowledge Graph in Google Semantic Search to the application of ProteinBank in AlphaFold, recognized by many as the most significant AI breakthrough. Furthermore, numerous domain-specific knowledge graphs/sources have been applied to improve AI methods in diverse domains such as medicine, healthcare, finance, manufacturing, and defense. Now, we move towards the third wave of AI built on the Neuro-Symbolic approach that combines the strengths of statistical and symbolic AI. Combining the respective powers and benefits of using knowledge graphs and deep learning is particularly attractive. This has led to the development of an approach and practice in computer science termed knowledge-infused (deep) learning (KiL). This dissertation will serve as a primer on methods that use diverse forms of knowledge: linguistic, commonsense, broad-based, and domain-specific and provide novel evaluation metrics to assess knowledge-infusion algorithms on various datasets, like social media, clinical interviews, electronic health records, information-seeking dialogues, and others. Specifically, this dissertation will provide necessary grounding in shallow infusion, semi-deep infusion, and a more advanced form called deep infusion to alleviate five bottlenecks in statistical AI: (1) Context Sensitivity, (2) Handling Uncertainty and Risk, (3) Interpretability, (4) User-level Explainability, and (5) Task Transferability. Further, the dissertation will introduce a new theoretical and conceptual approach called Process Knowledge Infusion, which enforces semantic flow in AI algorithms by altering their learning behavior with procedural knowledge. Such knowledge is manifested in questionnaires and guidelines that are usable by AI (or KiL) systems for sensible and safety-constrained response generation. The hurdle to prove the acceptability of KiL in AI and natural language understanding community lies in the absence of realistic datasets that can demonstrate five bottlenecks in statistical AI. The dissertation describes the process involved in constructing a wide variety of gold-standard datasets using expert knowledge, questionnaires, guidelines, and knowledge graphs. These datasets challenge statistical AI on explainability, interpretability, uncertainty, and context-sensitivity and showcase remarkable performance gains obtained by KiL-based algorithms. This dissertation termed these gold-standard datasets as Knowledge-intensive Language Understanding (KILU) tasks and considered them complementary to well-adopted General Language Understanding and Evaluation (GLUE) benchmarks. On KILU and GLUE datasets, KiL-based algorithms outperformed existing state-of-the-arts in natural language generation and classification problems. Furthermore, KiL-based algorithms provided user-understandable explanations in sensitive problems like Mental Health by highlighting concepts that depicts the reason behind model’s prediction or generation. Mapping of these concepts to entities in external knowledge source can support experts with user-level explanations and reasoning. A cohort-based qualitative evaluation informed that KiL should support stronger interleaving of a greater variety of knowledge at different levels of abstraction with layers in a deep learning architecture. This would enforce controlled knowledge infusion and prevent model from extrapolating or overgeneralization. This dissertation open future research questions on neural models within the domain of natural language understanding. For instance, (a) Which layer within a deep neural language model (NLMs) require knowledge? (b) It is known that NLMs learn by abstraction. How to leverage external knowledge’s inherent abstraction in enhancing the context of learned statistical representation? (c) Layered knowledge infusion might result in high-energy nodes contributing to the outcome. This is counter to the current softmaxbased predictions. How to pick the most probable outcome? and others. This dissertation provide a firsthand towards addressing these questions; however, much efficient methods are needed that provide user-level explanations, be interpretable, and propel safe AI
    corecore