52 research outputs found
Detecting Spammers via Aggregated Historical Data Set
The battle between email service providers and senders of mass unsolicited
emails (Spam) continues to gain traction. Vast numbers of Spam emails are sent
mainly from automatic botnets distributed over the world. One method for
mitigating Spam in a computationally efficient manner is fast and accurate
blacklisting of the senders. In this work we propose a new sender reputation
mechanism that is based on an aggregated historical data-set which encodes the
behavior of mail transfer agents over time. A historical data-set is created
from labeled logs of received emails. We use machine learning algorithms to
build a model that predicts the \emph{spammingness} of mail transfer agents in
the near future. The proposed mechanism is targeted mainly at large enterprises
and email service providers and can be used for updating both the black and the
white lists. We evaluate the proposed mechanism using 9.5M anonymized log
entries obtained from the biggest Internet service provider in Europe.
Experiments show that proposed method detects more than 94% of the Spam emails
that escaped the blacklist (i.e., TPR), while having less than 0.5%
false-alarms. Therefore, the effectiveness of the proposed method is much
higher than of previously reported reputation mechanisms, which rely on emails
logs. In addition, the proposed method, when used for updating both the black
and white lists, eliminated the need in automatic content inspection of 4 out
of 5 incoming emails, which resulted in dramatic reduction in the filtering
computational load.Comment: This is a conference version of the HDS research. 13 pages 10 figure
Deployment Optimization of IoT Devices through Attack Graph Analysis
The Internet of things (IoT) has become an integral part of our life at both
work and home. However, these IoT devices are prone to vulnerability exploits
due to their low cost, low resources, the diversity of vendors, and proprietary
firmware. Moreover, short range communication protocols (e.g., Bluetooth or
ZigBee) open additional opportunities for the lateral movement of an attacker
within an organization. Thus, the type and location of IoT devices may
significantly change the level of network security of the organizational
network. In this paper, we quantify the level of network security based on an
augmented attack graph analysis that accounts for the physical location of IoT
devices and their communication capabilities. We use the depth-first branch and
bound (DFBnB) heuristic search algorithm to solve two optimization problems:
Full Deployment with Minimal Risk (FDMR) and Maximal Utility without Risk
Deterioration (MURD). An admissible heuristic is proposed to accelerate the
search. The proposed method is evaluated using a real network with simulated
deployment of IoT devices. The results demonstrate (1) the contribution of the
augmented attack graphs to quantifying the impact of IoT devices deployed
within the organization on security, and (2) the effectiveness of the optimized
IoT deployment
Has the Online Discussion Been Manipulated? Quantifying Online Discussion Authenticity within Online Social Media
Online social media (OSM) has a enormous influence in today's world. Some
individuals view OSM as fertile ground for abuse and use it to disseminate
misinformation and political propaganda, slander competitors, and spread spam.
The crowdturfing industry employs large numbers of bots and human workers to
manipulate OSM and misrepresent public opinion. The detection of online
discussion topics manipulated by OSM \emph{abusers} is an emerging issue
attracting significant attention. In this paper, we propose an approach for
quantifying the authenticity of online discussions based on the similarity of
OSM accounts participating in the discussion to known abusers and legitimate
accounts. Our method uses several similarity functions for the analysis and
classification of OSM accounts. The proposed methods are demonstrated using
Twitter data collected for this study and previously published \emph{Arabic
honeypot dataset}. The former includes manually labeled accounts and abusers
who participated in crowdturfing platforms. Evaluation of the topic's
authenticity, derived from account similarity functions, shows that the
suggested approach is effective for discriminating between topics that were
strongly promoted by abusers and topics that attracted authentic public
interest
Detecting Clickbait in Online Social Media: You Won't Believe How We Did It
In this paper, we propose an approach for the detection of clickbait posts in
online social media (OSM). Clickbait posts are short catchy phrases that
attract a user's attention to click to an article. The approach is based on a
machine learning (ML) classifier capable of distinguishing between clickbait
and legitimate posts published in OSM. The suggested classifier is based on a
variety of features, including image related features, linguistic analysis, and
methods for abuser detection. In order to evaluate our method, we used two
datasets provided by Clickbait Challenge 2017. The best performance obtained by
the ML classifier was an AUC of 0.8, an accuracy of 0.812, precision of 0.819,
and recall of 0.966. In addition, as opposed to previous studies, we found that
clickbait post titles are statistically significant shorter than legitimate
post titles. Finally, we found that counting the number of formal English words
in the given content is useful for clickbait detection
ATHAFI: Agile Threat Hunting And Forensic Investigation
Attackers rapidly change their attacks to evade detection. Even the most
sophisticated Intrusion Detection Systems that are based on artificial
intelligence and advanced data analytic cannot keep pace with the rapid
development of new attacks. When standard detection mechanisms fail or do not
provide sufficient forensic information to investigate and mitigate attacks,
targeted threat hunting performed by competent personnel is used.
Unfortunately, many organization do not have enough security analysts to
perform threat hunting tasks and today the level of automation of threat
hunting is low.
In this paper we describe a framework for agile threat hunting and forensic
investigation (ATHAFI), which automates the threat hunting process at multiple
levels. Adaptive targeted data collection, attack hypotheses generation,
hypotheses testing, and continuous threat intelligence feeds allow to perform
simple investigations in a fully automated manner. The increased level of
automation will significantly boost the analyst's productivity during
investigation of the harshest cases.
Special Workflow Generation module adapts the threat hunting procedures
either to the latest Threat Intelligence obtained from external sources (e.g.
National CERT) or to the likeliest attack hypotheses generated by the Attack
Hypotheses Generation module. The combination of Attack Hypotheses Generation
and Workflows Generation enables intelligent adjustment of workflows, which
react to emerging threats effectively
Organization Mining Using Online Social Networks
Mature social networking services are one of the greatest assets of today's
organizations. This valuable asset, however, can also be a threat to an
organization's confidentiality. Members of social networking websites expose
not only their personal information, but also details about the organizations
for which they work. In this paper we analyze several commercial organizations
by mining data which their employees have exposed on Facebook, LinkedIn, and
other publicly available sources. Using a web crawler designed for this
purpose, we extract a network of informal social relationships among employees
of a given target organization. Our results, obtained using centrality analysis
and Machine Learning techniques applied to the structure of the informal
relationships network, show that it is possible to identify leadership roles
within the organization solely by this means. It is also possible to gain
valuable non-trivial insights on an organization's structure by clustering its
social network and gathering publicly available information on the employees
within each cluster. Organizations wanting to conceal their internal structure,
identity of leaders, location and specialization of branches offices, etc.,
must enforce strict policies to control the use of social media by their
employees.Comment: Draft Versio
How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning
Searching for information about a specific person is an online activity
frequently performed by many users. In most cases, users are aided by queries
containing a name and sending back to the web search engines for finding their
will. Typically, Web search engines provide just a few accurate results
associated with a name-containing query. Currently, most solutions for
suggesting synonyms in online search are based on pattern matching and phonetic
encoding, however very often, the performance of such solutions is less than
optimal. In this paper, we propose SpokenName2Vec, a novel and generic approach
which addresses the similar name suggestion problem by utilizing automated
speech generation, and deep learning to produce spoken name embeddings. This
sophisticated and innovative embeddings captures the way people pronounce names
in any language and accent. Utilizing the name pronunciation can be helpful for
both differentiating and detecting names that sound alike, but are written
differently. The proposed approach was demonstrated on a large-scale dataset
consisting of 250,000 forenames and evaluated using a machine learning
classifier and 7,399 names with their verified synonyms. The performance of the
proposed approach was found to be superior to 10 other algorithms evaluated in
this study, including well used phonetic and string similarity algorithms, and
two recently proposed algorithms. The results obtained suggest that the
proposed approach could serve as a useful and valuable tool for solving the
similar name suggestion problem.Comment: arXiv admin note: text overlap with arXiv:1912.0400
It Runs in the Family: Searching for Synonyms Using Digitized Family Trees
Searching for a person's name is a common online activity. However, Web
search engines provide few accurate results to queries containing names. In
contrast to a general word which has only one correct spelling, there are
several legitimate spellings of a given name. Today, most techniques used to
suggest synonyms in online search are based on pattern matching and phonetic
encoding, however they often perform poorly. As a result, there is a need for
an effective tool for improved synonym suggestion. In this paper, we propose a
revolutionary approach for tackling the problem of synonym suggestion. Our
novel algorithm, GRAFT, utilizes historical data collected from genealogy
websites, along with network algorithms. GRAFT is a general algorithm that
suggests synonyms using a graph based on names derived from digitized ancestral
family trees. Synonyms are extracted from this graph, which is constructed
using generic ordering functions that outperform other algorithms that suggest
synonyms based on a single dimension, a factor that limits their performance.
We evaluated GRAFT's performance on three ground truth datasets of forenames
and surnames, including a large-scale online genealogy dataset with over 16
million profiles and more than 700,000 unique forenames and 500,000 surnames.
We compared GRAFT's performance at suggesting synonyms to 10 other algorithms,
including phonetic encoding, string similarity algorithms, and machine and deep
learning algorithms. The results show GRAFT's superiority with respect to both
forenames and surnames and demonstrate its use as a tool to improve synonym
suggestion.Comment: 20 page
PALE: Partially Asynchronous Agile Leader Election
Many tasks executed in dynamic distributed systems, such as sensor networks
or enterprise environments with bring-your-own-device policy, require central
coordination by a leader node. In the past it has been proven that distributed
leader election in dynamic environments with constant changes and asynchronous
communication is not possible. Thus, state-of-the-art leader election
algorithms are not applicable in asynchronous environments with constant
network changes. Some algorithms converge only after the network stabilizes (an
unrealistic requirement in many dynamic environments). Other algorithms reach
consensus in the presence of network changes but require a global clock or some
level of communication synchronization.
Determining the weakest assumptions, under which leader election is possible,
remains an unresolved problem. In this study we present a leader election
algorithm that operates in the presence of changes and under weak (realistic)
assumptions regarding message delays and regarding the clock drifts of the
distributed nodes. The proposed algorithm is self-sufficient, easy to implement
and can be extended to support multiple regions, self-stabilization, and
wireless ad-hoc networks. We prove the algorithm's correctness and provide a
complexity analysis of the time, space, and number of messages required to
elect a leader.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
The Chameleon Attack: Manipulating Content Display in Online Social Media
Online social networks (OSNs) are ubiquitous attracting millions of users all
over the world. Being a popular communication media OSNs are exploited in a
variety of cyber attacks. In this article, we discuss the Chameleon attack
technique, a new type of OSN-based trickery where malicious posts and profiles
change the way they are displayed to OSN users to conceal themselves before the
attack or avoid detection. Using this technique, adversaries can, for example,
avoid censorship by concealing true content when it is about to be inspected;
acquire social capital to promote new content while piggybacking a trending
one; cause embarrassment and serious reputation damage by tricking a victim to
like, retweet, or comment a message that he wouldn't normally do without any
indication for the trickery within the OSN. An experiment performed with closed
Facebook groups of sports fans shows that (1) Chameleon pages can pass by the
moderation filters by changing the way their posts are displayed and (2)
moderators do not distinguish between regular and Chameleon pages. We list the
OSN weaknesses that facilitate the Chameleon attack and propose a set of
mitigation guidelines
- …