18 research outputs found

    Automatic Detection of Online Jihadist Hate Speech

    Full text link
    We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

    Understanding the Roots of Radicalisation on Twitter

    Get PDF
    In an increasingly digital world, identifying signs of online extremism sits at the top of the priority list for counter-extremist agencies. Researchers and governments are investing in the creation of advanced information technologies to identify and counter extremism through intelligent large-scale analysis of online data. However, to the best of our knowledge, these technologies are neither based on, nor do they take advantage of, the existing theories and studies of radicalisation. In this paper we propose a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of ’roots of radicalisation’ from social science models. This approach has been applied to analyse and compare the radicalisation level of 112 pro-ISIS vs.112 “general" Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation

    Detecting Textual Propaganda Using Machine Learning Techniques

    Get PDF
    سيطرت الشبكات الاجتماعية على العالم بأسره من خلال توفير منصة لنشر المعلومات. عادة ما يشارك الناس المعلومات دون معرفة صدقها. في الوقت الحاضر ، تُستخدم الشبكات الاجتماعية لاكتساب النفوذ في العديد من المجالات مثل الانتخابات والإعلانات وما إلى ذلك ، وليس من المستغرب أن تصبح وسائل التواصل الاجتماعي سلاحًا للتلاعب بالمشاعر من خلال نشر معلومات مُضللة. الدعاية هي إحدى المحاولات المنهجية والمتعمدة التي تستخدم للتأثير على الناس لتحقيق مكاسب سياسية ودينية. في هذه الورقة البحثية ، تم بذل جهود لتصنيف النص الدعائي من النص غير الدعائي باستخدام خوارزميات التعلم الآلي الخاضعة للإشراف. تم جمع البيانات من مصادر الأخبار في الفترة من يوليو 2018 إلى أغسطس 2018. بعد إضافة التعليقات التوضيحية على النص ، يتم تنفيذ هندسة الميزات باستخدام تقنيات مثل مصطلح تردد / تردد الوثيقة العكسي (TF / IDF) وحقيبة الكلمات (BOW). يتم توفير الميزات ذات الصلة لدعم المصنفات المتجهة (SVM) و Multinomial Naïve Bayesian (MNB). يتم إجراء ضبط دقيق لـ SVM عن طريق أخذ kernel Linear و Poly و RBF. أظهر SVM نتائج أفضل من MNB من خلال دقة 70٪ واسترجاع 76.5٪ ودرجة F1 69.5٪ ودقة كلية 69.2٪.Social Networking has dominated the whole world by providing a platform of information dissemination. Usually people share information without knowing its truthfulness. Nowadays Social Networks are used for gaining influence in many fields like in elections, advertisements etc. It is not surprising that social media has become a weapon for manipulating sentiments by spreading disinformation.  Propaganda is one of the systematic and deliberate attempts used for influencing people for the political, religious gains. In this research paper, efforts were made to classify Propagandist text from Non-Propagandist text using supervised machine learning algorithms. Data was collected from the news sources from July 2018-August 2018. After annotating the text, feature engineering is performed using techniques like term frequency/inverse document frequency (TF/IDF) and Bag of words (BOW). The relevant features are supplied to support vector machine (SVM) and Multinomial Naïve Bayesian (MNB) classifiers. The fine tuning of SVM is being done by taking kernel Linear, Poly and RBF. SVM showed better results than MNB by having precision of 70%, recall of 76.5%, F1 Score of 69.5% and overall Accuracy of 69.2%

    Organized Behavior Classification of Tweet Sets using Supervised Learning Methods

    Full text link
    During the 2016 US elections Twitter experienced unprecedented levels of propaganda and fake news through the collaboration of bots and hired persons, the ramifications of which are still being debated. This work proposes an approach to identify the presence of organized behavior in tweets. The Random Forest, Support Vector Machine, and Logistic Regression algorithms are each used to train a model with a data set of 850 records consisting of 299 features extracted from tweets gathered during the 2016 US presidential election. The features represent user and temporal synchronization characteristics to capture coordinated behavior. These models are trained to classify tweet sets among the categories: organic vs organized, political vs non-political, and pro-Trump vs pro-Hillary vs neither. The random forest algorithm performs better with greater than 95% average accuracy and f-measure scores for each category. The most valuable features for classification are identified as user based features, with media use and marking tweets as favorite to be the most dominant.Comment: 51 pages, 5 figure

    (De)constructing difference: a qualitative review of the ‘othering’ of UK Muslim communities, extremism, soft harms, and Twitter analytics

    Get PDF
    There is some evidence that, in the UK, current counter terrorism initiatives reproduce and amplify both real and imagined differences between Muslim and anti-Muslim groups, leading in turn to social and community polarisation and isolation. It is far from clear whether these changing perceptions always lead to increased ethnic and religious violence or increased radicalisation. However, more worrying is the potential for the development of ‘soft harms’ among those ‘suspect communities; for example reduced social integration, withdrawal from British cultural life, hate crime, forced marriage and domestic violence. There has to date been little interrogation of the scale of ‘soft harm’ among Muslim communities. Within this paper, the author offers a qualitative review of how the Muslim ‘other’ has become an ascribed category reproduced through an endemic ‘Mulsim common sense’. Following that the author suggests that Twitter analytics may be harnessed to analyse the attitudes, current condition, and reactions of suspect other communities through the tweeting of everyday events. The aim in doing so is to develop a series of proposals to counter the ideological underpinnings of difference and contribute to current debates on counter terrorism policy in the UK

    Leveraging Natural Language Processing to Analyse the Temporal Behavior of Extremists on Social Media

    Get PDF
    Aiming at achieving sustainability and quality of life for citizens, future smart cities adopt a data-centric approach to decision making in which assets, people, and events are constantly monitored to inform decisions. Public opinion monitoring is of particular importance to governments and intelligence agencies, who seek to monitor extreme views and attempts of radicalizing individuals in society. While social media platforms provide increased visibility and a platform to express public views freely, such platforms can also be used to manipulate public opinion, spread hate speech, and radicalize others. Natural language processing and data mining techniques have gained popularity for the analysis of social media content and the detection of extremists and radical views expressed online. However, existing approaches simplify the concept of radicalization to a binary problem in which individuals are classified as extremists or non-extremists. Such binary approaches do not capture the radicalization process\u27s complexity that is influenced by many aspects such as social interactions, the impact of opinion leaders, and peer pressure. Moreover, the longitudinal analysis of users\u27 interactions and profile evolution over time is lacking in the literature. Aiming at addressing those limitations, this work proposes a sophisticated framework for the analysis of the temporal behavior of extremists on social media platforms. Far-right extremism during the Trump presidency was used as a case study, and a large dataset of over 259,000 tweets was collected to train and test our models. The results obtained are very promising and encourage the use of advanced social media analytics in the support of effective and timely decision-making

    Analisis Sentimen Konten Radikal Melalui Dokumen Twitter Menggunakan Metode Backpropagation

    Get PDF
    Twitter adalah layanan jejaring sosial dimana pengguna dapat memposting dan berinteraksi dengan pesan, yang dikenal sebagai "tweet". Twitter juga digunakan oleh sebagian orang untuk memberikan opini mereka terhadap suatu hal namun terkadang terlalu berlebihan bahkan juga kadang ditemukan tweet yang berbau radikal. Tindakan radikal yang ada pada media sosial biasanya disebut dengan konten radikal. Konten-konten radikal yang ada di media sosial tentu dapat merugikan beberapa pihak. Ada juga pihak-pihak tertentu yang memanfaatkan konten radikal untuk mencapai tujuan tertentu. Oleh sebab itu pada penelitian ini mencoba menganalisis tweet berbahasa Indonesia yang mengandung kata radikal, termasuk dalam konten radikal positif atau radikal negatif. Tweet yg di dapat dari twitter yang berisi opini masyarakat yang mengarah ke konten radikal akan di klasifikasikan. Tweet tadi bisa disebut dokumen atau data terlebih dahulu akan melalui proses preprocessing. Kemudian dokumen tadi di pecah menjadi 6 jenis kata, diantaranya yaitu kata benda, kata kerja dan kata sifat dimana masing-masing jenis kata akan di bagi lagi menjadi positif dan negatif. Setelah di pecah akan dihitung berapa banyak jumlah jenis kata dalam masing-masing dokumen sehingga bisa diubah menjadi angka yang selanjutnya bisa dimasukkan ke dalam rumus algoritma

    The Potential Impact of Big Data in International Development and Humanitarian Aid

    Full text link
    Honors (Bachelor's)International StudiesUniversity of Michiganhttps://deepblue.lib.umich.edu/bitstream/2027.42/139612/1/emjabs.pd

    Modeling Islamist Extremist Communications on Social Media using Contextual Dimensions: Religion, Ideology, and Hate

    Full text link
    Terror attacks have been linked in part to online extremist content. Although tens of thousands of Islamist extremism supporters consume such content, they are a small fraction relative to peaceful Muslims. The efforts to contain the ever-evolving extremism on social media platforms have remained inadequate and mostly ineffective. Divergent extremist and mainstream contexts challenge machine interpretation, with a particular threat to the precision of classification algorithms. Our context-aware computational approach to the analysis of extremist content on Twitter breaks down this persuasion process into building blocks that acknowledge inherent ambiguity and sparsity that likely challenge both manual and automated classification. We model this process using a combination of three contextual dimensions -- religion, ideology, and hate -- each elucidating a degree of radicalization and highlighting independent features to render them computationally accessible. We utilize domain-specific knowledge resources for each of these contextual dimensions such as Qur'an for religion, the books of extremist ideologues and preachers for political ideology and a social media hate speech corpus for hate. Our study makes three contributions to reliable analysis: (i) Development of a computational approach rooted in the contextual dimensions of religion, ideology, and hate that reflects strategies employed by online Islamist extremist groups, (ii) An in-depth analysis of relevant tweet datasets with respect to these dimensions to exclude likely mislabeled users, and (iii) A framework for understanding online radicalization as a process to assist counter-programming. Given the potentially significant social impact, we evaluate the performance of our algorithms to minimize mislabeling, where our approach outperforms a competitive baseline by 10.2% in precision.Comment: 22 page
    corecore