15 research outputs found
INSTANT MESSAGING SPAM DETECTION IN LONG TERM EVOLUTION NETWORKS
The lack of efficient spam detection modules for packet data communication is resulting to increased threat exposure for the telecommunication network users and the service providers. In this thesis, we propose a novel approach to classify spam at the server side by intercepting packet-data communication among instant messaging applications. Spam detection is performed using machine learning techniques on packet headers and contents (if unencrypted) in two different phases: offline training and online classification. The contribution of this study is threefold. First, it identifies the scope of deploying a spam detection module in a state-of-the-art telecommunication architecture. Secondly, it compares the usefulness of various existing machine learning algorithms in order to intercept and classify data packets in near real-time communication of the instant messengers. Finally, it evaluates the accuracy and classification time of spam detection using our approach in a simulated environment of continuous packet data communication. Our research results are mainly generated by executing instances of a peer-to-peer instant messaging application prototype within a simulated Long Term Evolution (LTE) telecommunication network environment. This prototype is modeled and executed using OPNET network modeling and simulation tools. The research produces considerable knowledge on addressing unsolicited packet monitoring in instant messaging and similar applications
In Tags We Trust: Trust modeling in social tagging of multimedia content
Tagging in online social networks is very popular these days, as it facilitates search and retrieval of multimedia content. However, noisy and spam annotations often make it difficult to perform an efficient search. Users may make mistakes in tagging and irrelevant tags and content may be maliciously added for advertisement or self-promotion. This article surveys recent advances in techniques for combatting such noise and spam in social tagging. We classify the state-of-the-art approaches into a few categories and study representative examples in each. We also qualitatively compare and contrast them and outline open issues for future research
Trust aware system for social networks: A comprehensive survey
Social networks are the platform for the users to get connected with other social network users based on their interest and life styles. Existing social networks have millions of users and the data generated by them are huge and it is difficult to differentiate the real users and the fake users. Hence a trust worthy system is recommended for differentiating the real and fake users. Social networking enables users to send friend requests, upload photos and tag their friends and even suggest them the web links based on the interest of the users. The friends recommended, the photos tagged and web links suggested may be a malware or an untrusted activity. Users on social networks are authorised by providing the personal data. This personal raw data is available to all other users online and there is no protection or methods to secure this data from unknown users. Hence to provide a trustworthy system and to enable real users activities a review on different methods to achieve trustworthy social networking systems are examined in this paper
Combating Threats to the Quality of Information in Social Systems
Many large-scale social systems such as Web-based social networks, online social media sites and Web-scale crowdsourcing systems have been growing rapidly, enabling millions of human participants to generate, share and consume content on a massive scale. This reliance on users can lead to many positive effects, including large-scale growth in the size and content in the community, bottom-up discovery of âcitizen-expertsâ, serendipitous discovery of new resources beyond the scope of the system designers, and new social-based information search and retrieval algorithms. But the relative openness and reliance on users coupled with the widespread interest and growth of these social systems carries risks and raises growing concerns over the quality of information in these systems.
In this dissertation research, we focus on countering threats to the quality of information in self-managing social systems. Concretely, we identify three classes of threats to these systems: (i) content pollution by social spammers, (ii) coordinated campaigns for strategic manipulation, and (iii) threats to collective attention. To combat these threats, we propose three inter-related methods for detecting evidence of these threats, mitigating their impact, and improving the quality of information in social systems. We augment this three-fold defense with an exploration of their origins in âcrowdturfingâ â a sinister counterpart to the enormous positive opportunities of crowdsourcing. In particular, this dissertation research makes four unique contributions:
âą The first contribution of this dissertation research is a framework for detecting and filtering social spammers and content polluters in social systems. To detect and filter individual social spammers and content polluters, we propose and evaluate a novel social honeypot-based approach.
âą Second, we present a set of methods and algorithms for detecting coordinated campaigns in large-scale social systems. We propose and evaluate a content- driven framework for effectively linking free text posts with common âtalking pointsâ and extracting campaigns from large-scale social systems.
âą Third, we present a dual study of the robustness of social systems to collective attention threats through both a data-driven modeling approach and deploy- ment over a real system trace. We evaluate the effectiveness of countermeasures deployed based on the first moments of a bursting phenomenon in a real system.
âą Finally, we study the underlying ecosystem of crowdturfing for engaging in each of the three threat types. We present a framework for âpulling back the curtainâ on crowdturfers to reveal their underlying ecosystem on both crowdsourcing sites and social media
Information quality in online social media and big data collection: an example of Twitter spam detection
La popularité des médias sociaux en ligne (Online Social Media - OSM) est fortement liée à la qualité du contenu généré par l'utilisateur (User Generated Content - UGC) et la
protection de la vie privĂ©e des utilisateurs. En se basant sur la dĂ©finition de la qualitĂ© de l'information, comme son aptitude Ă ĂȘtre exploitĂ©e, la facilitĂ© d'utilisation des
OSM soulÚve de nombreux problÚmes en termes de la qualité de l'information ce qui impacte les performances des applications exploitant ces OSM. Ces problÚmes sont causés par des
individus mal intentionnés (nommés spammeurs) qui utilisent les OSM pour disséminer des fausses informations et/ou des informations indésirables telles que les contenus
commerciaux illégaux. La propagation et la diffusion de telle information, dit spam, entraßnent d'énormes problÚmes affectant la qualité de services proposés par les OSM.
La majorité des OSM (comme Facebook, Twitter, etc.) sont quotidiennement attaquées par un énorme nombre d'utilisateurs mal intentionnés. Cependant, les techniques de filtrage
adoptées par les OSM se sont avérées inefficaces dans le traitement de ce type d'information bruitée, nécessitant plusieurs semaines ou voir plusieurs mois pour filtrer
l'information spam. En effet, plusieurs dĂ©fis doivent ĂȘtre surmontĂ©es pour rĂ©aliser une mĂ©thode de filtrage de l'information bruitĂ©e . Les dĂ©fis majeurs sous-jacents Ă cette
problĂ©matique peuvent ĂȘtre rĂ©sumĂ©s par : (i) donnĂ©es de masse ; (ii) vie privĂ©e et sĂ©curitĂ© ; (iii) hĂ©tĂ©rogĂ©nĂ©itĂ© des structures dans les rĂ©seaux sociaux ; (iv) diversitĂ© des
formats du UGC ; (v) subjectivité et objectivité.
Notre travail s'inscrit dans le cadre de l'amélioration de la qualité des contenus en termes de messages partagés (contenu spam) et de profils des utilisateurs (spammeurs) sur
les OSM en abordant en détail les défis susmentionnés. Comme le spam social est le problÚme le plus récurant qui apparaßt sur les OSM, nous proposons deux approches génériques
pour détecter et filtrer le contenu spam : i) La premiÚre approche consiste à détecter le contenu spam (par exemple, les tweets spam) dans un flux en temps réel. ii) La seconde
approche est dédiée au traitement d'un grand volume des données relatives aux profils utilisateurs des spammeurs (par exemple, les comptes Twitter).
Pour filtrer le contenu spam en temps réel, nous introduisons une approche d'apprentissage non supervisée qui permet le filtrage en temps réel des tweets spams dans laquelle la
fonction de classification est adaptée automatiquement. La fonction de classification est entraßné de maniÚre itérative et ne requiÚre pas une collection de données annotées
manuellement.
Dans la deuxiÚme approche, nous traitons le problÚme de classification des profils utilisateurs dans le contexte d'une collection de données à grande échelle. Nous proposons de
faire une recherche dans un espace réduit de profils utilisateurs (une communauté d'utilisateurs) au lieu de traiter chaque profil d'utilisateur à part. Ensuite, chaque profil
qui appartient à cet espace réduit est analysé pour prédire sa classe à l'aide d'un modÚle de classification binaire.
Les expériences menées sur Twitter ont montré que le modÚle de classification collective non supervisé proposé est capable de générer une fonction efficace de classification
binaire en temps réel des tweets qui s'adapte avec l'évolution des stratégies des spammeurs sociaux sur Twitter. L'approche proposée surpasse les performances de deux méthodes
de l'état de l'art de détection de spam en temps réel. Les résultats de la deuxiÚme approche ont démontré que l'extraction des métadonnées des spams et leur exploitation dans le
processus de recherche de profils de spammeurs est réalisable dans le contexte de grandes collections de profils Twitter. L'approche proposée est une alternative au traitement
de tous les profils existants dans le OSM.The popularity of OSM is mainly conditioned by the integrity and the quality of UGC as well as the protection of users' privacy. Based on the definition of information quality
as fitness for use, the high usability and accessibility of OSM have exposed many information quality (IQ) problems which consequently decrease the performance of OSM dependent
applications. Such problems are caused by ill-intentioned individuals who misuse OSM services to spread different kinds of noisy information, including fake information, illegal
commercial content, drug sales, mal- ware downloads, and phishing links. The propagation and spreading of noisy information cause enormous drawbacks related to resources
consumptions, decreasing quality of service of OSM-based applications, and spending human efforts.
The majority of popular social networks (e.g., Facebook, Twitter, etc) over the Web 2.0 is daily attacked by an enormous number of ill-intentioned users. However, those popular
social networks are ineffective in handling the noisy information, requiring several weeks or months to detect them. Moreover, different challenges stand in front of building a
complete OSM-based noisy information filtering methods that can overcome the shortcomings of OSM information filters. These challenges are summarized in: (i) big data; (ii)
privacy and security; (iii) structure heterogeneity; (iv) UGC format diversity; (v) subjectivity and objectivity; (vi) and service limitations
In this thesis, we focus on increasing the quality of social UGC that are published and publicly accessible in forms of posts and profiles over OSNs through addressing in-depth
the stated serious challenges. As the social spam is the most common IQ problem appearing over the OSM, we introduce a design of two generic approaches for detecting and
filtering out the spam content. The first approach is for detecting the spam posts (e.g., spam tweets) in a real-time stream, while the other approach is dedicated for handling
a big data collection of social profiles (e.g., Twitter accounts). For filtering the spam content in real-time, we introduce an unsupervised collective-based framework that
automatically adapts a supervised spam tweet classification function in order to have an updated real-time classifier without requiring manual annotated data-sets. In the second
approach, we treat the big data collections through minimizing the search space of profiles that needs advanced analysis, instead of processing every user's profile existing in
the collections. Then, each profile falling in the reduced search space is further analyzed in an advanced way to produce an accurate decision using a binary classification
model.
The experiments conducted on Twitter online social network have shown that the unsupervised collective-based framework is able to produce updated and effective real- time binary
tweet-based classification function that adapts the high evolution of social spammer's strategies on Twitter, outperforming the performance of two existing real- time spam
detection methods. On the other hand, the results of the second approach have demonstrated that performing a preprocessing step for extracting spammy meta-data values and
leveraging them in the retrieval process is a feasible solution for handling a large collections of Twitter profiles, as an alternative solution for processing all profiles
existing in the input data collection.
The introduced approaches open different opportunities for information science researcher to leverage our solutions in other information filtering problems and applications. Our
long term perspective consists of (i) developing a generic platform covering most common OSM for instantly checking the quality of a given piece of information where the forms
of the input information could be profiles, website links, posts, and plain texts; (ii) and transforming and adapting our methods to handle additional IQ problems such as rumors
and information overloading
Recommended from our members
The entangled cyberspace: an integrated approach for predicting cyber-attacks
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonSignificant studies in cyber defence analysis have predominantly revolved around a single linear analysis of information from a single source of evidence (The Network). These studies were limited in their ability to understand the dynamics of entanglements related to cyber-incidents. This research integrates evidence beyond the network in an attempt to understand and predict phases of the kill-chain across the information space.
This research provides a multi-dimensional phased analysis of the traditional kill-chain model using structural vector autoregressive models. In the âEntangled Cyberspace Frameworkâ, each phase of the kill-chain corresponds to a single dimension of the information space based on time observations of certain events. Events are represented as time signals, where each phase is characterised by multiple time signals representing multiple events on that phase. Multiple time signals are analysed using structural models for multiple time series analysis (Vector Auto-Regressive models). At each phase of the kill-chain, we perform a lagged co-integration analysis of events across the information space. This nature of analysis detects hidden entanglements that characterise events in the kill-chain beyond the network. The measured prediction accuracy and error measured at each stage of the experiment represents the usefulness of selected events in characterising the defined stage of the kill-chain.
The entangled cyberspace, in theory, is the fusion of three conceptual foundations: a) A multi-dimensional characterisation of cyberspace, b) A sequential phased model for perpetrating cyber-attacks and c) A structural model for integrating and simultaneously analysing multiple sources of evidence. It starts with the characterisation of the information space into different dimensions of interest. The framework goes further to identify evidence sources across these characterised dimensions and integrates them in the analytical context under consideration (e.g. Malware Injection).
The concrete findings show that our approach and analytical methodology are capable of detecting entanglements when applied to a set of entangled activities across the information space. The findings also prove that activities beyond the network have significant effects on the nature of the unfolding cyber-attack vector. The predictive features of events across the kill-chain were also presented in this research as opinion and emotion drivers on the social dimension, packet data details and social and cultural events on the economic layer. Finally, co-integration detected between events across and within dimensions of the information space proves the existence of both inter-dimensional and intra-dimensional entanglements that affect the nature of events unfolding during the kill-chain (from the adversaryâs point of view).
The novelty of this research rests in the ability to hop across the information space for detecting evidential clues of activities that are related-to cyber-incidents. This research also expands the standard multi-dimensional information space to include SPEC factors as indicators of cyber-incidents. This research improves the current information security management model, specifically in the monitoring, analysis and detection phases. This research provides a methodology that accommodates a robust evidence base for understanding the attack surface. Practically, this research provides a basis for creating applications and tools for protecting critical national infrastructure by integrating data from social platforms, real-world political, cultural and economic events and the cyber-physical
Enhancing Web Browsing Security
Web browsing has become an integral part of our lives, and we use browsers to perform many important activities almost everyday and everywhere. However, due to the vulnerabilities in Web browsers and Web applications and also due to Web users\u27 lack of security knowledge, browser-based attacks are rampant over the Internet and have caused substantial damage to both Web users and service providers. Enhancing Web browsing security is therefore of great need and importance.;This dissertation concentrates on enhancing the Web browsing security through exploring and experimenting with new approaches and software systems. Specifically, we have systematically studied four challenging Web browsing security problems: HTTP cookie management, phishing, insecure JavaScript practices, and browsing on untrusted public computers. We have proposed new approaches to address these problems, and built unique systems to validate our approaches.;To manage HTTP cookies, we have proposed an approach to automatically validate the usefulness of HTTP cookies at the client-side on behalf of users. By automatically removing useless cookies, our approach helps a user to strike an appropriate balance between maximizing usability and minimizing security risks. to protect against phishing attacks, we have proposed an approach to transparently feed a relatively large number of bogus credentials into a suspected phishing site. Using those bogus credentials, our approach conceals victims\u27 real credentials and enables a legitimate website to identify stolen credentials in a timely manner. to identify insecure JavaScript practices, we have proposed an execution-based measurement approach and performed a large-scale measurement study. Our work sheds light on the insecure JavaScript practices and especially reveals the severity and nature of insecure JavaScript inclusion and dynamic generation practices on the Web. to achieve secure and convenient Web browsing on untrusted public computers, we have proposed a simple approach that enables an extended browser on a mobile device and a regular browser on a public computer to collaboratively support a Web session. A user can securely perform sensitive interactions on the mobile device and conveniently perform other browsing interactions on the public computer
AN ENHANCEMENT ON TARGETED PHISHING ATTACKS IN THE STATE OF QATAR
The latest report by Kaspersky on Spam and Phishing, listed Qatar as one of the top 10 countries by percentage of email phishing and targeted phishing attacks. Since the Qatari economy has grown exponentially and become increasingly global in nature, email phishing and targeted phishing attacks have the capacity to be devastating to the Qatari economy, yet there are no adequate measures put in place such as awareness training programmes to minimise these threats to the state of Qatar. Therefore, this research aims to explore targeted attacks in specific organisations in the state of Qatar by presenting a new technique to prevent targeted attacks. This novel enterprise-wide email phishing detection system has been used by organisations and individuals not only in the state of Qatar but also in organisations in the UK. This detection system is based on domain names by which attackers carefully register domain names which victims trust. The results show that this detection system has proven its ability to reduce email phishing attacks. Moreover, it aims to develop email phishing awareness training techniques specifically designed for the state of Qatar to complement the presented technique in order to increase email phishing awareness, focused on targeted attacks and the content, and reduce the impact of phishing email attacks. This research was carried out by developing an interactive email phishing awareness training website that has been tested by organisations in the state of Qatar. The results of this training programme proved to get effective results by training users on how to spot email phishing and targeted attacks