1,313 research outputs found

    Spam Filtering using Support Vector Machine

    Get PDF
    The traditional anti-spam techniques like Black and White List is not up to the mark in current scenario. The goal of Spam Classification is to distinguish between spam and legitimate mail message. But with the popularization of the Internet, it is challenging to develop spam filters that can effectively eliminate the increasing volumes of unwanted mails automatically before they enter a user\u27s mailbox. Many researchers have been trying to separate spam from legitimate emails using machine learning algorithms based on statistical learning methods. In this paper, we evaluate the performance of Non Linear SVM based classifiers with various kernel functions over Enron Dataset

    Personalized question-based cybersecurity recommendation systems

    Full text link
    En ces temps de pandémie Covid19, une énorme quantité de l’activité humaine est modifiée pour se faire à distance, notamment par des moyens électroniques. Cela rend plusieurs personnes et services vulnérables aux cyberattaques, d’où le besoin d’une éducation généralisée ou du moins accessible sur la cybersécurité. De nombreux efforts sont entrepris par les chercheurs, le gouvernement et les entreprises pour protéger et assurer la sécurité des individus contre les pirates et les cybercriminels. En raison du rôle important joué par les systèmes de recommandation dans la vie quotidienne de l'utilisateur, il est intéressant de voir comment nous pouvons combiner les systèmes de cybersécurité et de recommandation en tant que solutions alternatives pour aider les utilisateurs à comprendre les cyberattaques auxquelles ils peuvent être confrontés. Les systèmes de recommandation sont couramment utilisés par le commerce électronique, les réseaux sociaux et les plateformes de voyage, et ils sont basés sur des techniques de systèmes de recommandation traditionnels. Au vu des faits mentionnés ci-dessus, et le besoin de protéger les internautes, il devient important de fournir un système personnalisé, qui permet de partager les problèmes, d'interagir avec un système et de trouver des recommandations. Pour cela, ce travail propose « Cyberhelper », un système de recommandation de cybersécurité personnalisé basé sur des questions pour la sensibilisation à la cybersécurité. De plus, la plateforme proposée est équipée d'un algorithme hybride associé à trois différents algorithmes basés sur la connaissance, les utilisateurs et le contenu qui garantit une recommandation personnalisée optimale en fonction du modèle utilisateur et du contexte. Les résultats expérimentaux montrent que la précision obtenue en appliquant l'algorithme proposé est bien supérieure à la précision obtenue en utilisant d'autres mécanismes de système de recommandation traditionnels. Les résultats suggèrent également qu'en adoptant l'approche proposée, chaque utilisateur peut avoir une expérience utilisateur unique, ce qui peut l'aider à comprendre l'environnement de cybersécurité.With the proliferation of the virtual universe and the multitude of services provided by the World Wide Web, a major concern arises: Security and privacy have never been more in jeopardy. Nowadays, with the Covid 19 pandemic, the world faces a new reality that pushed the majority of the workforce to telecommute. This thereby creates new vulnerabilities for cyber attackers to exploit. It’s important now more than ever, to educate and offer guidance towards good cybersecurity hygiene. In this context, a major effort has been dedicated by researchers, governments, and businesses alike to protect people online against hackers and cybercriminals. With a focus on strengthening the weakest link in the cybersecurity chain which is the human being, educational and awareness-raising tools have been put to use. However, most researchers focus on the “one size fits all” solutions which do not focus on the intricacies of individuals. This work aims to overcome that by contributing a personalized question-based recommender system. Named “Cyberhelper”, this work benefits from an existing mature body of research on recommender system algorithms along with recent research on non-user-specific question-based recommenders. The reported proof of concept holds potential for future work in adapting Cyberhelper as an everyday assistant for different types of users and different contexts

    Clustering and Classification of Email Contents

    Get PDF
    Information users depend heavily on emails\u27 system as one of the major sources of communication. Its importance and usage are continuously growing despite the evolution of mobile applications, social networks, etc. Emails are used on both the personal and professional levels. They can be considered as official documents in communication among users. Emails\u27 data mining and analysis can be conducted for several purposes such as: Spam detection and classification, subject classification, etc. In this paper, a large set of personal emails is used for the purpose of folder and subject classifications. Algorithms are developed to perform clustering and classification for this large text collection. Classification based on NGram is shown to be the best for such large text collection especially as text is Bi-language (i.e. with English and Arabic content)

    Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion

    Full text link
    Large Language Models (LLMs) excel at tackling various natural language tasks. However, due to the significant costs involved in re-training or fine-tuning them, they remain largely static and difficult to personalize. Nevertheless, a variety of applications could benefit from generations that are tailored to users' preferences, goals, and knowledge. Among them is web search, where knowing what a user is trying to accomplish, what they care about, and what they know can lead to improved search experiences. In this work, we propose a novel and general approach that augments an LLM with relevant context from users' interaction histories with a search engine in order to personalize its outputs. Specifically, we construct an entity-centric knowledge store for each user based on their search and browsing activities on the web, which is then leveraged to provide contextually relevant LLM prompt augmentations. This knowledge store is light-weight, since it only produces user-specific aggregate projections of interests and knowledge onto public knowledge graphs, and leverages existing search log infrastructure, thereby mitigating the privacy, compliance, and scalability concerns associated with building deep user profiles for personalization. We then validate our approach on the task of contextual query suggestion, which requires understanding not only the user's current search context but also what they historically know and care about. Through a number of experiments based on human evaluation, we show that our approach is significantly better than several other LLM-powered baselines, generating query suggestions that are contextually more relevant, personalized, and useful

    Analyzing Social and Stylometric Features to Identify Spear phishing Emails

    Full text link
    Spear phishing is a complex targeted attack in which, an attacker harvests information about the victim prior to the attack. This information is then used to create sophisticated, genuine-looking attack vectors, drawing the victim to compromise confidential information. What makes spear phishing different, and more powerful than normal phishing, is this contextual information about the victim. Online social media services can be one such source for gathering vital information about an individual. In this paper, we characterize and examine a true positive dataset of spear phishing, spam, and normal phishing emails from Symantec's enterprise email scanning service. We then present a model to detect spear phishing emails sent to employees of 14 international organizations, by using social features extracted from LinkedIn. Our dataset consists of 4,742 targeted attack emails sent to 2,434 victims, and 9,353 non targeted attack emails sent to 5,912 non victims; and publicly available information from their LinkedIn profiles. We applied various machine learning algorithms to this labeled data, and achieved an overall maximum accuracy of 97.76% in identifying spear phishing emails. We used a combination of social features from LinkedIn profiles, and stylometric features extracted from email subjects, bodies, and attachments. However, we achieved a slightly better accuracy of 98.28% without the social features. Our analysis revealed that social features extracted from LinkedIn do not help in identifying spear phishing emails. To the best of our knowledge, this is one of the first attempts to make use of a combination of stylometric features extracted from emails, and social features extracted from an online social network to detect targeted spear phishing emails.Comment: Detection of spear phishing using social media feature

    Combating User Misbehavior on Social Media

    Get PDF
    Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion. First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection. Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics. Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS. We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework. Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media

    Syntactic and Semantic Analysis and Visualization of Unstructured English Texts

    Get PDF
    People have complex thoughts, and they often express their thoughts with complex sentences using natural languages. This complexity may facilitate efficient communications among the audience with the same knowledge base. But on the other hand, for a different or new audience this composition becomes cumbersome to understand and analyze. Analysis of such compositions using syntactic or semantic measures is a challenging job and defines the base step for natural language processing. In this dissertation I explore and propose a number of new techniques to analyze and visualize the syntactic and semantic patterns of unstructured English texts. The syntactic analysis is done through a proposed visualization technique which categorizes and compares different English compositions based on their different reading complexity metrics. For the semantic analysis I use Latent Semantic Analysis (LSA) to analyze the hidden patterns in complex compositions. I have used this technique to analyze comments from a social visualization web site for detecting the irrelevant ones (e.g., spam). The patterns of collaborations are also studied through statistical analysis. Word sense disambiguation is used to figure out the correct sense of a word in a sentence or composition. Using textual similarity measure, based on the different word similarity measures and word sense disambiguation on collaborative text snippets from social collaborative environment, reveals a direction to untie the knots of complex hidden patterns of collaboration
    • …
    corecore