Search CORE

52 research outputs found

Detecting Spammers via Aggregated Historical Data Set

Author: Menahem Eitan
Puzis Rami
Publication venue
Publication date: 07/05/2012
Field of study

The battle between email service providers and senders of mass unsolicited emails (Spam) continues to gain traction. Vast numbers of Spam emails are sent mainly from automatic botnets distributed over the world. One method for mitigating Spam in a computationally efficient manner is fast and accurate blacklisting of the senders. In this work we propose a new sender reputation mechanism that is based on an aggregated historical data-set which encodes the behavior of mail transfer agents over time. A historical data-set is created from labeled logs of received emails. We use machine learning algorithms to build a model that predicts the \emph{spammingness} of mail transfer agents in the near future. The proposed mechanism is targeted mainly at large enterprises and email service providers and can be used for updating both the black and the white lists. We evaluate the proposed mechanism using 9.5M anonymized log entries obtained from the biggest Internet service provider in Europe. Experiments show that proposed method detects more than 94% of the Spam emails that escaped the blacklist (i.e., TPR), while having less than 0.5% false-alarms. Therefore, the effectiveness of the proposed method is much higher than of previously reported reputation mechanisms, which rely on emails logs. In addition, the proposed method, when used for updating both the black and white lists, eliminated the need in automatic content inspection of 4 out of 5 incoming emails, which resulted in dramatic reduction in the filtering computational load.Comment: This is a conference version of the HDS research. 13 pages 10 figure

arXiv.org e-Print Archive

Deployment Optimization of IoT Devices through Attack Graph Analysis

Author: Agmon Noga
Puzis Rami
Shabtai Asaf
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/04/2019
Field of study

The Internet of things (IoT) has become an integral part of our life at both work and home. However, these IoT devices are prone to vulnerability exploits due to their low cost, low resources, the diversity of vendors, and proprietary firmware. Moreover, short range communication protocols (e.g., Bluetooth or ZigBee) open additional opportunities for the lateral movement of an attacker within an organization. Thus, the type and location of IoT devices may significantly change the level of network security of the organizational network. In this paper, we quantify the level of network security based on an augmented attack graph analysis that accounts for the physical location of IoT devices and their communication capabilities. We use the depth-first branch and bound (DFBnB) heuristic search algorithm to solve two optimization problems: Full Deployment with Minimal Risk (FDMR) and Maximal Utility without Risk Deterioration (MURD). An admissible heuristic is proposed to accelerate the search. The proposed method is evaluated using a real network with simulated deployment of IoT devices. The results demonstrate (1) the contribution of the augmented attack graphs to quantifying the impact of IoT devices deployed within the organization on security, and (2) the effectiveness of the optimized IoT deployment

arXiv.org e-Print Archive

Has the Online Discussion Been Manipulated? Quantifying Online Discussion Authenticity within Online Social Media

Author: Bendahan Jorge
Elyashar Aviad
Puzis Rami
Publication venue
Publication date: 04/01/2018
Field of study

Online social media (OSM) has a enormous influence in today's world. Some individuals view OSM as fertile ground for abuse and use it to disseminate misinformation and political propaganda, slander competitors, and spread spam. The crowdturfing industry employs large numbers of bots and human workers to manipulate OSM and misrepresent public opinion. The detection of online discussion topics manipulated by OSM \emph{abusers} is an emerging issue attracting significant attention. In this paper, we propose an approach for quantifying the authenticity of online discussions based on the similarity of OSM accounts participating in the discussion to known abusers and legitimate accounts. Our method uses several similarity functions for the analysis and classification of OSM accounts. The proposed methods are demonstrated using Twitter data collected for this study and previously published \emph{Arabic honeypot dataset}. The former includes manually labeled accounts and abusers who participated in crowdturfing platforms. Evaluation of the topic's authenticity, derived from account similarity functions, shows that the suggested approach is effective for discriminating between topics that were strongly promoted by abusers and topics that attracted authentic public interest

arXiv.org e-Print Archive

Detecting Clickbait in Online Social Media: You Won't Believe How We Did It

Author: Bendahan Jorge
Elyashar Aviad
Puzis Rami
Publication venue
Publication date: 18/10/2017
Field of study

In this paper, we propose an approach for the detection of clickbait posts in online social media (OSM). Clickbait posts are short catchy phrases that attract a user's attention to click to an article. The approach is based on a machine learning (ML) classifier capable of distinguishing between clickbait and legitimate posts published in OSM. The suggested classifier is based on a variety of features, including image related features, linguistic analysis, and methods for abuser detection. In order to evaluate our method, we used two datasets provided by Clickbait Challenge 2017. The best performance obtained by the ML classifier was an AUC of 0.8, an accuracy of 0.812, precision of 0.819, and recall of 0.966. In addition, as opposed to previous studies, we found that clickbait post titles are statistically significant shorter than legitimate post titles. Finally, we found that counting the number of formal English words in the given content is useful for clickbait detection

arXiv.org e-Print Archive

ATHAFI: Agile Threat Hunting And Forensic Investigation

Author: Elovici Yuval
Puzis Rami
Zilberman Polina
Publication venue
Publication date: 07/03/2020
Field of study

Attackers rapidly change their attacks to evade detection. Even the most sophisticated Intrusion Detection Systems that are based on artificial intelligence and advanced data analytic cannot keep pace with the rapid development of new attacks. When standard detection mechanisms fail or do not provide sufficient forensic information to investigate and mitigate attacks, targeted threat hunting performed by competent personnel is used. Unfortunately, many organization do not have enough security analysts to perform threat hunting tasks and today the level of automation of threat hunting is low. In this paper we describe a framework for agile threat hunting and forensic investigation (ATHAFI), which automates the threat hunting process at multiple levels. Adaptive targeted data collection, attack hypotheses generation, hypotheses testing, and continuous threat intelligence feeds allow to perform simple investigations in a fully automated manner. The increased level of automation will significantly boost the analyst's productivity during investigation of the harshest cases. Special Workflow Generation module adapts the threat hunting procedures either to the latest Threat Intelligence obtained from external sources (e.g. National CERT) or to the likeliest attack hypotheses generated by the Attack Hypotheses Generation module. The combination of Attack Hypotheses Generation and Workflows Generation enables intelligent adjustment of workflows, which react to emerging threats effectively

arXiv.org e-Print Archive

Organization Mining Using Online Social Networks

Author: Elovici Yuval
Fire Michael
Puzis Rami
Publication venue
Publication date: 02/09/2013
Field of study

Mature social networking services are one of the greatest assets of today's organizations. This valuable asset, however, can also be a threat to an organization's confidentiality. Members of social networking websites expose not only their personal information, but also details about the organizations for which they work. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of a given target organization. Our results, obtained using centrality analysis and Machine Learning techniques applied to the structure of the informal relationships network, show that it is possible to identify leadership roles within the organization solely by this means. It is also possible to gain valuable non-trivial insights on an organization's structure by clustering its social network and gathering publicly available information on the employees within each cluster. Organizations wanting to conceal their internal structure, identity of leaders, location and specialization of branches offices, etc., must enforce strict policies to control the use of social media by their employees.Comment: Draft Versio

arXiv.org e-Print Archive

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Author: Elyashar Aviad
Fire Michael
Puzis Rami
Publication venue
Publication date: 21/07/2020
Field of study

Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Currently, most solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec, a novel and generic approach which addresses the similar name suggestion problem by utilizing automated speech generation, and deep learning to produce spoken name embeddings. This sophisticated and innovative embeddings captures the way people pronounce names in any language and accent. Utilizing the name pronunciation can be helpful for both differentiating and detecting names that sound alike, but are written differently. The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames and evaluated using a machine learning classifier and 7,399 names with their verified synonyms. The performance of the proposed approach was found to be superior to 10 other algorithms evaluated in this study, including well used phonetic and string similarity algorithms, and two recently proposed algorithms. The results obtained suggest that the proposed approach could serve as a useful and valuable tool for solving the similar name suggestion problem.Comment: arXiv admin note: text overlap with arXiv:1912.0400

arXiv.org e-Print Archive

It Runs in the Family: Searching for Synonyms Using Digitized Family Trees

Author: Elyashar Aviad
Fire Michael
Puzis Rami
Publication venue
Publication date: 29/01/2021
Field of study

Searching for a person's name is a common online activity. However, Web search engines provide few accurate results to queries containing names. In contrast to a general word which has only one correct spelling, there are several legitimate spellings of a given name. Today, most techniques used to suggest synonyms in online search are based on pattern matching and phonetic encoding, however they often perform poorly. As a result, there is a need for an effective tool for improved synonym suggestion. In this paper, we propose a revolutionary approach for tackling the problem of synonym suggestion. Our novel algorithm, GRAFT, utilizes historical data collected from genealogy websites, along with network algorithms. GRAFT is a general algorithm that suggests synonyms using a graph based on names derived from digitized ancestral family trees. Synonyms are extracted from this graph, which is constructed using generic ordering functions that outperform other algorithms that suggest synonyms based on a single dimension, a factor that limits their performance. We evaluated GRAFT's performance on three ground truth datasets of forenames and surnames, including a large-scale online genealogy dataset with over 16 million profiles and more than 700,000 unique forenames and 500,000 surnames. We compared GRAFT's performance at suggesting synonyms to 10 other algorithms, including phonetic encoding, string similarity algorithms, and machine and deep learning algorithms. The results show GRAFT's superiority with respect to both forenames and surnames and demonstrate its use as a tool to improve synonym suggestion.Comment: 20 page

arXiv.org e-Print Archive

PALE: Partially Asynchronous Agile Leader Election

Author: Elovici Yuval
Puzis Rami
Sidik Bronislav
Zilberman Polina
Publication venue
Publication date: 11/01/2018
Field of study

Many tasks executed in dynamic distributed systems, such as sensor networks or enterprise environments with bring-your-own-device policy, require central coordination by a leader node. In the past it has been proven that distributed leader election in dynamic environments with constant changes and asynchronous communication is not possible. Thus, state-of-the-art leader election algorithms are not applicable in asynchronous environments with constant network changes. Some algorithms converge only after the network stabilizes (an unrealistic requirement in many dynamic environments). Other algorithms reach consensus in the presence of network changes but require a global clock or some level of communication synchronization. Determining the weakest assumptions, under which leader election is possible, remains an unresolved problem. In this study we present a leader election algorithm that operates in the presence of changes and under weak (realistic) assumptions regarding message delays and regarding the clock drifts of the distributed nodes. The proposed algorithm is self-sufficient, easy to implement and can be extended to support multiple regions, self-stabilization, and wireless ad-hoc networks. We prove the algorithm's correctness and provide a complexity analysis of the time, space, and number of messages required to elect a leader.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

The Chameleon Attack: Manipulating Content Display in Online Social Media

Author: Elyashar Aviad
Paradise Abigail
Puzis Rami
Uziel Sagi
Publication venue
Publication date: 24/01/2020
Field of study

Online social networks (OSNs) are ubiquitous attracting millions of users all over the world. Being a popular communication media OSNs are exploited in a variety of cyber attacks. In this article, we discuss the Chameleon attack technique, a new type of OSN-based trickery where malicious posts and profiles change the way they are displayed to OSN users to conceal themselves before the attack or avoid detection. Using this technique, adversaries can, for example, avoid censorship by concealing true content when it is about to be inspected; acquire social capital to promote new content while piggybacking a trending one; cause embarrassment and serious reputation damage by tricking a victim to like, retweet, or comment a message that he wouldn't normally do without any indication for the trickery within the OSN. An experiment performed with closed Facebook groups of sports fans shows that (1) Chameleon pages can pass by the moderation filters by changing the way their posts are displayed and (2) moderators do not distinguish between regular and Chameleon pages. We list the OSN weaknesses that facilitate the Chameleon attack and propose a set of mitigation guidelines

arXiv.org e-Print Archive