594 research outputs found
A Decade of Social Bot Detection
On the morning of November 9th 2016, the world woke up to the shocking
outcome of the US Presidential elections: Donald Trump was the 45th President
of the United States of America. An unexpected event that still has tremendous
consequences all over the world. Today, we know that a minority of social bots,
automated social media accounts mimicking humans, played a central role in
spreading divisive messages and disinformation, possibly contributing to
Trump's victory. In the aftermath of the 2016 US elections, the world started
to realize the gravity of widespread deception in social media. Following
Trump's exploit, we witnessed to the emergence of a strident dissonance between
the multitude of efforts for detecting and removing bots, and the increasing
effects that these malicious actors seem to have on our societies. This paradox
opens a burning question: What strategies should we enforce in order to stop
this social bot pandemic? In these times, during the run-up to the 2020 US
elections, the question appears as more crucial than ever. What stroke social,
political and economic analysts after 2016, deception and automation, has been
however a matter of study for computer scientists since at least 2010. In this
work, we briefly survey the first decade of research in social bot detection.
Via a longitudinal analysis, we discuss the main trends of research in the
fight against bots, the major results that were achieved, and the factors that
make this never-ending battle so challenging. Capitalizing on lessons learned
from our extensive analysis, we suggest possible innovations that could give us
the upper hand against deception and manipulation. Studying a decade of
endeavours at social bot detection can also inform strategies for detecting and
mitigating the effects of other, more recent, forms of online deception, such
as strategic information operations and political trolls.Comment: Forthcoming in Communications of the AC
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Graph Mining for Cybersecurity: A Survey
The explosive growth of cyber attacks nowadays, such as malware, spam, and
intrusions, caused severe consequences on society. Securing cyberspace has
become an utmost concern for organizations and governments. Traditional Machine
Learning (ML) based methods are extensively used in detecting cyber threats,
but they hardly model the correlations between real-world cyber entities. In
recent years, with the proliferation of graph mining techniques, many
researchers investigated these techniques for capturing correlations between
cyber entities and achieving high performance. It is imperative to summarize
existing graph-based cybersecurity solutions to provide a guide for future
studies. Therefore, as a key contribution of this paper, we provide a
comprehensive review of graph mining for cybersecurity, including an overview
of cybersecurity tasks, the typical graph mining techniques, and the general
process of applying them to cybersecurity, as well as various solutions for
different cybersecurity tasks. For each task, we probe into relevant methods
and highlight the graph types, graph approaches, and task levels in their
modeling. Furthermore, we collect open datasets and toolkits for graph-based
cybersecurity. Finally, we outlook the potential directions of this field for
future research
MaMaDroid: Detecting Android malware by building markov chains of behavioral models (extended version)
As Android has become increasingly popular, so has malware targeting it, thus motivating the research community
to propose different detection techniques. However, the constant evolution of the Android ecosystem,
and of malware itself, makes it hard to design robust tools that can operate for long periods of time without
the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from
a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a
static-analysis based system that abstracts app’s API calls to their class, package, or family, and builds a model
from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is
more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a
dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it effectively
detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time
(up to 0.87 F-measure two years after training). We also show that MaMaDroid remarkably overperforms
DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to
assess whether MaMaDroid’s effectiveness mainly stems from the API abstraction or from the sequencing
modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls.
We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that
include API calls that are equally or more frequently used by benign apps
MaMaDroid: Detecting Android malware by building Markov chains of behavioral models (extended version)
As Android has become increasingly popular, so has malware targeting it, thus motivating the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a static-analysis-based system that abstracts app's API calls to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a dataset of 8.5K benign and 35.5K malicious apps collected over a period of 6 years, showing that it effectively detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time (up to 0.87 F-measure 2 years after training). We also show that MaMaDroid remarkably overperforms DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to assess whether MaMaDroid's effectiveness mainly stems from the API abstraction or from the sequencing modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls. We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that include API calls that are equally or more frequently used by benign apps
A Deep Learning Based Approach To Detect Covert Channels Attacks and Anomaly In New Generation Internet Protocol IPv6
The increased dependence of internet-based technologies in all facets of life
challenges the government and policymakers with the need for effective shield mechanism
against passive and active violations. Following up with the Qatar national vision 2030
activities and its goals for “Achieving Security, stability and maintaining public safety”
objectives, the present paper aims to propose a model for safeguarding the information and
monitor internet communications effectively. The current study utilizes a deep learning
based approach for detecting malicious communications in the network traffic. Considering
the efficiency of deep learning in data analysis and classification, a convolutional neural
network model was proposed. The suggested model is equipped for detecting attacks in
IPv6. The performance of the proposed detection algorithm was validated using a number
of datasets, including a newly created dataset. The performance of the model was evaluated
for covert channel, DDoS attacks detection in IPv6 and for anomaly detection. The
performance assessment produced an accuracy of 100%, 85% and 98% for covert channel
detection, DDoS detection and anomaly detection respectively. The project put forward a
novel approach for detecting suspicious communications in the network traffic
Large Language Model Alignment: A Survey
Recent years have witnessed remarkable progress made in large language models
(LLMs). Such advancements, while garnering significant attention, have
concurrently elicited various concerns. The potential of these models is
undeniably vast; however, they may yield texts that are imprecise, misleading,
or even detrimental. Consequently, it becomes paramount to employ alignment
techniques to ensure these models to exhibit behaviors consistent with human
values.
This survey endeavors to furnish an extensive exploration of alignment
methodologies designed for LLMs, in conjunction with the extant capability
research in this domain. Adopting the lens of AI alignment, we categorize the
prevailing methods and emergent proposals for the alignment of LLMs into outer
and inner alignment. We also probe into salient issues including the models'
interpretability, and potential vulnerabilities to adversarial attacks. To
assess LLM alignment, we present a wide variety of benchmarks and evaluation
methodologies. After discussing the state of alignment research for LLMs, we
finally cast a vision toward the future, contemplating the promising avenues of
research that lie ahead.
Our aspiration for this survey extends beyond merely spurring research
interests in this realm. We also envision bridging the gap between the AI
alignment research community and the researchers engrossed in the capability
exploration of LLMs for both capable and safe LLMs.Comment: 76 page
- …