10,224 research outputs found

    Automatic Detection of Online Jihadist Hate Speech

    Full text link
    We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

    Terrorists, fanatics, and extremists: The language of anti-Muslim prejudice

    Get PDF
    This paper examines contemporary expressions of anti-Muslim prejudice in Western society. Representations of "Islam" and "Muslim" were collected in a 9.87 billion-word corpus of web-based newspapers and magazines published between 2010 and 2020, in order to identify and analyze usage and connotation. This paper adopts a corpus linguistics approach, in which an analysis of collocation (co-occurring words) and concordance (contextual) data was performed. The results reveal how Islam and Muslim are frequently framed negatively (e.g., as "radical", "extremist", "terrorist", and "violent"), while other negative stereotypes and images of Islam and Muslim people were frequently attested in the data. This paper further explores anti-Muslim linguicism in Anglophone countries and makes an original contribution to the wider debate on the issue of prejudice against Muslim people

    The Israeli-Palestinian Conflict in American, Arab, and British Media: Corpus-Based Critical Discourse Analysis

    Get PDF
    The Israeli-Palestinian conflict is one of the longest and most violent conflicts in modern history. The language used to represent this important conflict in the media is frequently commented on by scholars and political commentators (e.g., Ackerman, 2001; Fisk, 2001; Mearsheimer & Walt, 2007). To date, however, few studies in the field of applied linguistics have attempted a thorough investigation of the language used to represent the conflict in influential media outlets using systematic methods of linguistic analysis. The current study aims to partially bridge this gap by combining methods and analytical frameworks from Critical Discourse Analysis (CDA) and Corpus Linguistics (CL) to analyze the discursive representation of the Israeli-Palestinian conflict in American, Arab, and British media, represented by CNN, Al-Jazeera Arabic, and BBC respectively. CDA, which is primarily interested in studying how power and ideology are enacted and resisted in the use of language in social and political contexts, has been frequently criticized mainly for the arbitrary selection of a small number of texts or text fragments to be analyzed. In order to strengthen CDA analysis, Stubbs (1997) suggested that CDA analysts should utilize techniques from CL, which employs computational approaches to perform quantitative and qualitative analysis of actual patterns of use occurring in a large and principled collection of natural texts. In this study, the corpus-based keyword technique is initially used to identify the topics that tend to be emphasized, downplayed, and/or left out in the coverage of the Israeli-Palestinian conflict in three corpora complied from the news websites of Al-Jazeera, CNN, and the BBC. Topics –such as terrorism, occupation, settlements, and the recent Israeli disengagement plan—which were found to be key in the coverage of the conflict—are further studied in context using several other corpus tools, especially the concordancer and the collocation finder. The analysis reveals some of the strategies employed by each news website to control for the positive or negative representations of the different actors involved in the conflict. The corpus findings are interpreted using some informative CDA frameworks, especially Van Dijk’s (1998) ideological square framework

    Terror and tourism : the economic consequences of media coverage

    Get PDF

    Multilingual Cross-domain Perspectives on Online Hate Speech

    Full text link
    In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page

    Understanding violence through social media

    Get PDF
    While social media analysis has been widely utilized to predict various market and political trends, its utilization to improve geospatial conflict prediction in contested environments remains understudied. To determine the feasibility of social media utilization in conflict prediction, we compared historical conflict data and social media metadata, utilizing over 829,537 geo-referenced messages sent through the Twitter network within Iraq from August 2013 to July 2014. From our research, we conclude that social media metadata has a positive impact on conflict prediction when compared with historical conflict data. Additionally, we find that utilizing the most extreme negative terminology from a locally derived social media lexicon provided the most significant predictive accuracy for determining areas that would experience subsequent violence. We suggest future research projects center on improving the conflict prediction capability of social media data and include social media analysis in operational assessments.http://archive.org/details/understandingvio1094556920Major, United States ArmyLieutenant Commander, United States NavyApproved for public release; distribution is unlimited

    Syria: the war of constructing identities in the digital space and the power of discursive practices

    Get PDF
    How have Syrians discursively constructed their identities on the social network Facebook between 2011 and 2018? How have various conflict parties used identity politics as a means of mobilization, and how such practices had deflected the rightful demands? Can linguistics using data-evidence approach help us better understand and analyse conflict and identify conflict resolution intervention points? This research tries to answer these questions amongst others in a series of attempts to show the potentials of multidisciplinary approach to conflict analysis for peace interventions through big data, discursive practices, history and the power of archive. This paper looks at self and group identity practices within the Syrian conflict by investigating the notion of identity formation from a data-driven perspective. The data is based on analysing published institutional content and comments by ordinary citizens on 296 Syrian conflict related Facebook pages between February 2011 and May 2018. The analysis shows four main clusters of social groups ideologies with certain overlaps and strong fragmentation within the Syrian revolution/opposition’s cluster. All clusters’ institutions and members have used different rhetorical and linguistic devices in representing their own groups’ identities and the other groups’ ones. While the roots of the conflict are structural in their nature, mainly of ethnic-religious ideational basis, institutional political messages had a clear role in triggering inflammatory discussions about these identity dimensions. Both the Syrian government and Islamist groups had relatively clear objectives stemming from clear ideologies and explicit communication models. Possessing the needed resources, both have operated within relatively formal structures. This entitled them to continue to construct cultural hegemony through various practices and disseminated discourses via institutions
    • …
    corecore