1,003 research outputs found

    Misinformation Detection in Social Media

    Get PDF
    abstract: The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information. As witnessed in recent incidents of misinformation, it escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Different from some existing research in psychology and social sciences about misinformation, social media platforms pose unprecedented challenges for misinformation detection. First, intentional spreaders of misinformation will actively disguise themselves. Second, content of misinformation may be manipulated to avoid being detected, while abundant contextual information may play a vital role in detecting it. Third, not only accuracy, earliness of a detection method is also important in containing misinformation from being viral. Fourth, social media platforms have been used as a fundamental data source for various disciplines, and these research may have been conducted in the presence of misinformation. To tackle the challenges, we focus on developing machine learning algorithms that are robust to adversarial manipulation and data scarcity. The main objective of this dissertation is to provide a systematic study of misinformation detection in social media. To tackle the challenges of adversarial attacks, I propose adaptive detection algorithms to deal with the active manipulations of misinformation spreaders via content and networks. To facilitate content-based approaches, I analyze the contextual data of misinformation and propose to incorporate the specific contextual patterns of misinformation into a principled detection framework. Considering its rapidly growing nature, I study how misinformation can be detected at an early stage. In particular, I focus on the challenge of data scarcity and propose a novel framework to enable historical data to be utilized for emerging incidents that are seemingly irrelevant. With misinformation being viral, applications that rely on social media data face the challenge of corrupted data. To this end, I present robust statistical relational learning and personalization algorithms to minimize the negative effect of misinformation.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter

    Full text link
    © 2019, Springer Nature B.V. In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter

    Exploring Cyberterrorism, Topic Models and Social Networks of Jihadists Dark Web Forums: A Computational Social Science Approach

    Get PDF
    This three-article dissertation focuses on cyber-related topics on terrorist groups, specifically Jihadists’ use of technology, the application of natural language processing, and social networks in analyzing text data derived from terrorists\u27 Dark Web forums. The first article explores cybercrime and cyberterrorism. As technology progresses, it facilitates new forms of behavior, including tech-related crimes known as cybercrime and cyberterrorism. In this article, I provide an analysis of the problems of cybercrime and cyberterrorism within the field of criminology by reviewing existing literature focusing on (a) the issues in defining terrorism, cybercrime, and cyberterrorism, (b) ways that cybercriminals commit a crime in cyberspace, and (c) ways that cyberterrorists attack critical infrastructure, including computer systems, data, websites, and servers. The second article is a methodological study examining the application of natural language processing computational techniques, specifically latent Dirichlet allocation (LDA) topic models and topic network analysis of text data. I demonstrate the potential of topic models by inductively analyzing large-scale textual data of Jihadist groups and supporters from three Dark Web forums to uncover underlying topics. The Dark Web forums are dedicated to Islam and the Islamic world discussions. Some members of these forums sympathize with and support terrorist organizations. Results indicate that topic modeling can be applied to analyze text data automatically; the most prevalent topic in all forums was religion. Forum members also discussed terrorism and terrorist attacks, supporting the Mujahideen fighters. A few of the discussions were related to relationships and marriages, advice, seeking help, health, food, selling electronics, and identity cards. LDA topic modeling is significant for finding topics from larger corpora such as the Dark Web forums. Implications for counterterrorism include the use of topic modeling in real-time classification and removal of online terrorist content and the monitoring of religious forums, as terrorist groups use religion to justify their goals and recruit in such forums for supporters. The third article builds on the second article, exploring the network structures of terrorist groups on the Dark Web forums. The two Dark Web forums\u27 interaction networks were created, and network properties were measured using social network analysis. A member is considered connected and interacting with other forum members when they post in the same threads forming an interaction network. Results reveal that the network structure is decentralized, sparse, and divided based on topics (religion, terrorism, current events, and relationships) and the members\u27 interests in participating in the threads. As participation in forums is an active process, users tend to select platforms most compatible with their views, forming a subgroup or community. However, some members are essential and influential in the information and resources flow within the networks. The key members frequently posted about religion, terrorism, and relationships in multiple threads. Identifying key members is significant for counterterrorism, as mapping network structures and key users are essential for removing and destabilizing terrorist networks. Taken together, this dissertation applies a computational social science approach to the analysis of cyberterrorism and the use of Dark Web forums by jihadists

    Political discussions in online oppositional communities in the non-democratic context

    Get PDF
    Taking into account YouTube’s specific role in the Russian media system and the increasing level of political polarization in the country, this study examines the role of incivility in discussions and whether discussions in an anti-government community represent a place for disagreement between pro-opposition and pro-government users. I argue that an online environment helps these sides meet each other rather than creating echo chambers of like-minded users. Moreover, in the quite restrictive Russian context for political deliberation, the incivility of messages plays a role in further involving commenters in discussions. Using the corpus of comments posted in the discussion section of opposition leader Alexei Navalny’s YouTube channel, I exploited class affinity modeling to identify pro-government and pro-opposition stances. Incivility was studied based on Google’s Perspective API toxicity classifier. I found that users avoid extreme forms of incivility when interacting with other commenters, but uncivil comments are more likely to start discussion threads. Furthermore, the level of incivility in comments gets higher over time after a video release. Pro-government sentiments, on the one hand, are associated with a subsequent response from Navalny’s supporters to the out-group criticism and, on the other hand, contribute to the further formation of hubs with a pro-government narrative. This research contributes to the extant literature on affective polarization on social media, shedding light on political discussions within an oppositional community in a non-democracy

    PrĂ©diction de la dĂ©tĂ©rioration du comportement Ă  l’aide de l’apprentissage automatique

    Get PDF
    Les plateformes de mĂ©dias sociaux rassemblent des individus pour interagir de maniĂšre amicale et civilisĂ©e tout en ayant des convictions et des croyances diversifiĂ©es. Certaines personnes adoptent des comportements rĂ©prĂ©hensibles qui nuisent Ă  la sĂ©rĂ©nitĂ© et affectent nĂ©gativement l’équanimitĂ© des autres utilisateurs. Certains cas de mauvaise conduite peuvent initialement avoir de petits effets statistiques, mais leur accumulation persistante pourrait entraĂźner des consĂ©quences majeures et dĂ©vastatrices. L’accumulation persistante des mauvais comportements peut ĂȘtre un prĂ©dicteur valide des facteurs de risque de dĂ©tĂ©rioration du comportement. Le problĂšme de la dĂ©tĂ©rioration du comportement n’a pas Ă©tĂ© largement Ă©tudiĂ© dans le contexte des mĂ©dias sociaux. La dĂ©tection prĂ©coce de la dĂ©tĂ©rioration du comportement peut ĂȘtre d’une importance cruciale pour Ă©viter que le mauvais comportement des individus ne s’aggrave. Cette thĂšse aborde le problĂšme de la dĂ©tĂ©rioration du comportement dans le contexte des mĂ©dias sociaux. Nous proposons de nouvelles mĂ©thodes basĂ©es sur l’apprentissage automatique qui (1) explorent les sĂ©quences comportementales et leurs motifs temporels pour faciliter la comprĂ©hension des comportements manifestĂ©s par les individus et (2) prĂ©disent la dĂ©tĂ©rioration du comportement Ă  partir de combinaisons consĂ©cutives de motifs sĂ©quentiels correspondant Ă  des comportements inappropriĂ©s. Nous menons des expĂ©riences approfondies Ă  l’aide d’ensembles de donnĂ©es du monde rĂ©el et dĂ©montrons la capacitĂ© de nos modĂšles Ă  prĂ©dire la dĂ©tĂ©rioration du comportement avec un haut degrĂ© de prĂ©cision, c’est-Ă -dire des scores F-1 supĂ©rieurs Ă  0,8. En outre, nous examinons la trajectoire de dĂ©tĂ©rioration du comportement afin de dĂ©couvrir les Ă©tats Ă©motionnels que les individus prĂ©sentent progressivement et d’évaluer si ces Ă©tats Ă©motionnels conduisent Ă  la dĂ©tĂ©rioration du comportement au fil du temps. Nos rĂ©sultats suggĂšrent que la colĂšre pourrait ĂȘtre un Ă©tat Ă©motionnel potentiel qui pourrait contribuer substantiellement Ă  la dĂ©tĂ©rioration du comportement
    • 

    corecore