744 research outputs found
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
Organized Behavior Classification of Tweet Sets using Supervised Learning Methods
During the 2016 US elections Twitter experienced unprecedented levels of
propaganda and fake news through the collaboration of bots and hired persons,
the ramifications of which are still being debated. This work proposes an
approach to identify the presence of organized behavior in tweets. The Random
Forest, Support Vector Machine, and Logistic Regression algorithms are each
used to train a model with a data set of 850 records consisting of 299 features
extracted from tweets gathered during the 2016 US presidential election. The
features represent user and temporal synchronization characteristics to capture
coordinated behavior. These models are trained to classify tweet sets among the
categories: organic vs organized, political vs non-political, and pro-Trump vs
pro-Hillary vs neither. The random forest algorithm performs better with
greater than 95% average accuracy and f-measure scores for each category. The
most valuable features for classification are identified as user based
features, with media use and marking tweets as favorite to be the most
dominant.Comment: 51 pages, 5 figure
Emotions behind drive-by download propagation on Twitter
Twitter has emerged as one of the most popular platforms to get updates on entertainment and current events. However, due to its 280 character restriction and automatic shortening of URLs, it is continuously targeted by cybercriminals to carry out drive-by download attacks, where a user’s system is infected by merely visiting a Web page. Popular events that attract a large number of users are used by cybercriminals to infect and propagate malware by using popular hashtags and creating misleading tweets to lure users to malicious Web pages. A drive-by download attack is carried out by obfuscating a malicious URL in an enticing tweet and used as clickbait to lure users to a malicious Web page. In this paper we answer the following two questions: Why are certain malicious tweets retweeted more than others? Do emotions reflecting in a tweet drive virality? We gathered tweets from seven different sporting events over three years and identified those tweets that used to carry to out a drive-by download attack. From the malicious (N=105,642) and benign (N=169,178) data sample identified, we built models to predict information flow size and survival. We define size as the number of retweets of an original tweet, and survival as the duration of the original tweet’s presence in the study window. We selected the zero-truncated negative binomial (ZTNB) regression method for our analysis based on the distribution exhibited by our dependent size measure and the comparison of results with other predictive models. We used the Cox regression technique to model the survival of information flows as it estimates proportional hazard rates for independent measures. Our results show that both social and content factors are statistically significant for the size and survival of information flows for both malicious and benign tweets. In the benign data sample, positive emotions and positive sentiment reflected in the tweet significantly predict size and survival. In contrast, for the malicious data sample, negative emotions, especially fear, are associated with both size and survival of information flows
routing in mobile opportunistic social networks with selfish nodes
When the connection to Internet is not available during networking activities, an opportunistic approach exploits the encounters between mobile human-carried devices for exchanging information. When users encounter each other, their handheld devices can communicate in a cooperative way, using the encounter opportunities for forwarding their messages, in a wireless manner. But, analyzing real behaviors, most of the nodes exhibit selfish behaviors, mostly to preserve the limited resources (data buffers and residual energy). That is the reason why node selfishness should be taken into account when describing networking activities: in this paper, we first evaluate the effects of node selfishness in opportunistic networks. Then, we propose a routing mechanism for managing node selfishness in opportunistic communications, namely, SORSI (Social-based Opportunistic Routing with Selfishness detection and Incentive mechanisms). SORSI exploits the social-based nature of node mobility and other social features of nodes to optimize message dissemination together with a selfishness detection mechanism, aiming at discouraging selfish behaviors and boosting data forwarding. Simulating several percentages of selfish nodes, our results on real-world mobility traces show that SORSI is able to outperform the social-based schemes Bubble Rap and SPRINT-SELF, employing also selfishness management in terms of message delivery ratio, overhead cost, and end-to-end average latency. Moreover, SORSI achieves delivery ratios and average latencies comparable to Epidemic Routing while having a significant lower overhead cost
NLP-Based Techniques for Cyber Threat Intelligence
In the digital era, threat actors employ sophisticated techniques for which,
often, digital traces in the form of textual data are available. Cyber Threat
Intelligence~(CTI) is related to all the solutions inherent to data collection,
processing, and analysis useful to understand a threat actor's targets and
attack behavior. Currently, CTI is assuming an always more crucial role in
identifying and mitigating threats and enabling proactive defense strategies.
In this context, NLP, an artificial intelligence branch, has emerged as a
powerful tool for enhancing threat intelligence capabilities. This survey paper
provides a comprehensive overview of NLP-based techniques applied in the
context of threat intelligence. It begins by describing the foundational
definitions and principles of CTI as a major tool for safeguarding digital
assets. It then undertakes a thorough examination of NLP-based techniques for
CTI data crawling from Web sources, CTI data analysis, Relation Extraction from
cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
Finally, the challenges and limitations of NLP in threat intelligence are
exhaustively examined, including data quality issues and ethical
considerations. This survey draws a complete framework and serves as a valuable
resource for security professionals and researchers seeking to understand the
state-of-the-art NLP-based threat intelligence techniques and their potential
impact on cybersecurity
- …