6,081 research outputs found

    Deep Learning for User Comment Moderation

    Full text link
    Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation

    A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian

    Get PDF
    Abusive speech in social media, including profanities, derogatory and hate speech, has reached the level of a pandemic. A system that would be able to detect such texts could help in making the Internet and social media a better and more respectful virtual space. Research and commercial application in this area were so far focused mainly on the English language. This paper presents the work on building AbCoSER, the first corpus of abusive speech in Serbian. The corpus consists of 6,436 manually annotated tweets, out of which 1,416 were labelled as tweets using some kind of abusive speech. Those 1,416 tweets were further sub-classified, for instance to those using vulgar, hate speech, derogatory language, etc. In this paper, we explain the process of data acquisition, annotation, and corpus construction. We also discuss the results of an initial analysis of the annotation quality. Finally, we present an abusive speech lexicon structure and its enrichment with abusive triggers extracted from the AbCoSER dataset

    Approaches to automated detection of cyberbullying:A Survey

    Get PDF
    Research into cyberbullying detection has increased in recent years, due in part to the proliferation of cyberbullying across social media and its detrimental effect on young people. A growing body of work is emerging on automated approaches to cyberbullying detection. These approaches utilise machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching textual data to the identified traits. In this paper, we present a systematic review of published research (as identified via Scopus, ACM and IEEE Xplore bibliographic databases) on cyberbullying detection approaches. On the basis of our extensive literature review, we categorise existing approaches into 4 main classes, namely; supervised learning, lexicon based, rule based and mixed-initiative approaches. Supervised learning-based approaches typically use classifiers such as SVM and Naïve Bayes to develop predictive models for cyberbullying detection. Lexicon based systems utilise word lists and use the presence of words within the lists to detect cyberbullying. Rules-based approaches match text to predefined rules to identify bullying and mixed-initiatives approaches combine human-based reasoning with one or more of the aforementioned approaches. We found lack of quality representative labelled datasets and non-holistic consideration of cyberbullying by researchers when developing detection systems are two key challenges facing cyberbullying detection research. This paper essentially maps out the state-of-the-art in cyberbullying detection research and serves as a resource for researchers to determine where to best direct their future research efforts in this field

    Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned From Fine-Tuned Embeddings

    Get PDF
    Social networks facilitate communication between people from all over the world. Unfortunately, the excessive use of social networks leads to the rise of antisocial behaviors such as the spread of online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive\offensive and hate detection become a crucial part of cyberharassment. Manual detection of cyberharassment is cumbersome, slow, and not even feasible in rapidly growing data. In this study, we addressed the challenges of automatic detection of the offensive tweets in the Arabic language. The main contribution of this study is to design and implement an intelligent prediction system encompassing a two-stage optimization approach to identify and classify the offensive from the non-offensive text. In the rst stage, the proposed approach ne-tuned the pre-trainedword embedding models by training them for several epochs on the training dataset. The embeddings of the vocabularies in the new dataset are trained and added to the old embeddings. While in the second stage, it employed a hybrid approach of two classi ers, namely XGBoost and SVM, and a genetic algorithm (GA) to mitigate the drawback of the classi ers in nding the optimal hyperparameter values to run the proposed approach. We tested the proposed approach on Arabic Cyberbullying Corpus (ArCybC), which contains tweets collected from four Twitter domains: gaming, sports, news, and celebrities. The ArCybC dataset has four categories: sexual, racial, intelligence, and appearance. The proposed approach produced superior results, in which the SVM algorithm with the Aravec SkipGram word embedding model achieved an accuracy rate of 88.2% and an F1-score rate of 87.8%.Ministerio Espanol de Ciencia e Innovacion (DemocratAI::UGR) PID2020-115570GB-C2

    Linguistic Threat Assessment: Understanding Targeted Violence through Computational Linguistics

    Get PDF
    Language alluding to possible violence is widespread online, and security professionals are increasingly faced with the issue of understanding and mitigating this phenomenon. The volume of extremist and violent online data presents a workload that is unmanageable for traditional, manual threat assessment. Computational linguistics may be of particular relevance to understanding threats of grievance-fuelled targeted violence on a large scale. This thesis seeks to advance knowledge on the possibilities and pitfalls of threat assessment through automated linguistic analysis. Based on in-depth interviews with expert threat assessment practitioners, three areas of language are identified which can be leveraged for automation of threat assessment, namely, linguistic content, style, and trajectories. Implementations of each area are demonstrated in three subsequent quantitative chapters. First, linguistic content is utilised to develop the Grievance Dictionary, a psycholinguistic dictionary aimed at measuring concepts related to grievance-fuelled violence in text. Thereafter, linguistic content is supplemented with measures of linguistic style in order to examine the feasibility of author profiling (determining gender, age, and personality) in abusive texts. Lastly, linguistic trajectories are measured over time in order to assess the effect of an external event on an extremist movement. Collectively, the chapters in this thesis demonstrate that linguistic automation of threat assessment is indeed possible. The concluding chapter describes the limitations of the proposed approaches and illustrates where future potential lies to improve automated linguistic threat assessment. Ideally, developers of computational implementations for threat assessment strive for explainability and transparency. Furthermore, it is argued that computational linguistics holds particular promise for large-scale measurement of grievance-fuelled language, but is perhaps less suited to prediction of actual violent behaviour. Lastly, researchers and practitioners involved in threat assessment are urged to collaboratively and critically evaluate novel computational tools which may emerge in the future

    Personality Disruption as Mental Torture: The CIA, Interrogational Abuse, and the U.S. Torture Act

    Get PDF
    This Article is a contribution to the torture debate. It argues that the abusive interrogation tactics used by the United States in what was then called the “global war on terrorism” are, unequivocally, torture under U.S. law. To some readers, this might sound like dĂ©jĂ  vu all over again. Hasn’t this issue been picked over for nearly fifteen years? It has, but we think the legal analysis we offer has been mostly overlooked. We argue that the basic character of the CIA’s interrogation of so-called “high-value detainees” has been misunderstood: both lawyers and commentators have placed far too much emphasis on the dozen or so “enhanced interrogation techniques” (EITs) short-listed in government “torture memos,” and far too little emphasis on other forms of physical violence, psychological stressors, environmental manipulations, and abusive conditions of confinement that are crucial to the question of whether the detainees were tortured. Furthermore, we dispute one of the standard narratives about the origins of the program: that it was the brainchild of civilian contractor psychologists because— in the CIA’s words—“[n]on-standard interrogation methodologies were not an area of expertise of CIA officers or of the US Government generally.” This narrative ignores the CIA’s role in devising these methods, in spite of the decades of prior CIA research and doctrine about forcing interrogation subjects into a state of extreme psychological debilitation, and about how to do so—by making them physically weak, intensely fearful and anxious, and helplessly dependent. By neglecting this history and focusing on the contractors and the EITs they devised, this narrative contributes to the misunderstanding that the torture debate is about EITs and nothing else. In effect, a “torture debate” about EITs and the torture memos neglects the purloined letter in front of our eyes: the abusive conditions the CIA inflicted on prisoners even when they were not subject to EITs, including abuses that the torture memos never bothered to discuss. Unpacking what this debate is really about turns out to be crucial to understanding that such interrogation methods are torture under existing U.S. law. The U.S. Torture Act includes a clause in its definition of mental torture that was intended to ban exactly the kind of interrogation methods the CIA had researched, out of concern that our Cold War adversaries were using them: mind-altering procedures “calculated to disrupt profoundly the senses or the personality.” That is precisely the “non-standard interrogation methodology” the CIA employed after 9/11
    • 

    corecore