92 research outputs found

    Clustering and classification methods for spam analysis

    Get PDF
    Spam emails are a major tool for criminals to distribute malware, conduct fraudulent activity, sell counterfeit products, etc. Thus, security companies are interested in researching spam. Unfortunately, due to the spammers' detection-avoidance techniques, most of the existing tools for spam analysis are not able to provide accurate information about spam campaigns. Moreover, they are not able to link together campaigns initiated by the same sender. F-Secure, a cybersecurity company, collects vast amounts of spam for analysis. The threat intelligence collection from these messages currently involves a lot of manual work. In this thesis we apply state-of-the-art data-analysis techniques to increase the level of automation in the analysis process, thus enabling the human experts to focus on high-level information such as campaigns and actors. The thesis discusses a novel method of spam analysis in which email messages are clustered by different characteristics and the clusters are presented as a graph. The graph representation allows the analyst to see evolving campaigns and even connections between related messages which themselves have no features in common. This makes our analysis tool more powerful than previous methods that simply cluster emails to sets. We implemented a proof of concept version of the analysis tool to evaluate the usefulness of the approach. Experiments show that the graph representation and clustering by different features makes it possible to link together large and complex spam campaigns that were previously not detected. The tools also found evidence that different campaigns were likely to be organized by the same spammer. The results indicate that the graph-based approach is able to extract new, useful information about spam campaigns

    Resilient and Scalable Android Malware Fingerprinting and Detection

    Get PDF
    Malicious software (Malware) proliferation reaches hundreds of thousands daily. The manual analysis of such a large volume of malware is daunting and time-consuming. The diversity of targeted systems in terms of architecture and platforms compounds the challenges of Android malware detection and malware in general. This highlights the need to design and implement new scalable and robust methods, techniques, and tools to detect Android malware. In this thesis, we develop a malware fingerprinting framework to cover accurate Android malware detection and family attribution. In this context, we emphasize the following: (i) the scalability over a large malware corpus; (ii) the resiliency to common obfuscation techniques; (iii) the portability over different platforms and architectures. In the context of bulk and offline detection on the laboratory/vendor level: First, we propose an approximate fingerprinting technique for Android packaging that captures the underlying static structure of the Android apps. We also propose a malware clustering framework on top of this fingerprinting technique to perform unsupervised malware detection and grouping by building and partitioning a similarity network of malicious apps. Second, we propose an approximate fingerprinting technique for Android malware's behavior reports generated using dynamic analyses leveraging natural language processing techniques. Based on this fingerprinting technique, we propose a portable malware detection and family threat attribution framework employing supervised machine learning techniques. Third, we design an automatic framework to produce intelligence about the underlying malicious cyber-infrastructures of Android malware. We leverage graph analysis techniques to generate relevant, actionable, and granular intelligence that can be used to identify the threat effects induced by malicious Internet activity associated to Android malicious apps. In the context of the single app and online detection on the mobile device level, we further propose the following: Fourth, we design a portable and effective Android malware detection system that is suitable for deployment on mobile and resource constrained devices, using machine learning classification on raw method call sequences. Fifth, we elaborate a framework for Android malware detection that is resilient to common code obfuscation techniques and adaptive to operating systems and malware change overtime, using natural language processing and deep learning techniques. We also evaluate the portability of the proposed techniques and methods beyond Android platform malware, as follows: Sixth, we leverage the previously elaborated techniques to build a framework for cross-platform ransomware fingerprinting relying on raw hybrid features in conjunction with advanced deep learning techniques

    Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach

    Get PDF
    [EN] Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in 2ms and 2.2ms on average, respectively.S

    Darknet as a Source of Cyber Threat Intelligence: Investigating Distributed and Reflection Denial of Service Attacks

    Get PDF
    Cyberspace has become a massive battlefield between computer criminals and computer security experts. In addition, large-scale cyber attacks have enormously matured and became capable to generate, in a prompt manner, significant interruptions and damage to Internet resources and infrastructure. Denial of Service (DoS) attacks are perhaps the most prominent and severe types of such large-scale cyber attacks. Furthermore, the existence of widely available encryption and anonymity techniques greatly increases the difficulty of the surveillance and investigation of cyber attacks. In this context, the availability of relevant cyber monitoring is of paramount importance. An effective approach to gather DoS cyber intelligence is to collect and analyze traffic destined to allocated, routable, yet unused Internet address space known as darknet. In this thesis, we leverage big darknet data to generate insights on various DoS events, namely, Distributed DoS (DDoS) and Distributed Reflection DoS (DRDoS) activities. First, we present a comprehensive survey of darknet. We primarily define and characterize darknet and indicate its alternative names. We further list other trap-based monitoring systems and compare them to darknet. In addition, we provide a taxonomy in relation to darknet technologies and identify research gaps that are related to three main darknet categories: deployment, traffic analysis, and visualization. Second, we characterize darknet data. Such information could generate indicators of cyber threat activity as well as provide in-depth understanding of the nature of its traffic. Particularly, we analyze darknet packets distribution, its used transport, network and application layer protocols and pinpoint its resolved domain names. Furthermore, we identify its IP classes and destination ports as well as geo-locate its source countries. We further investigate darknet-triggered threats. The aim is to explore darknet inferred threats and categorize their severities. Finally, we contribute by exploring the inter-correlation of such threats, by applying association rule mining techniques, to build threat association rules. Specifically, we generate clusters of threats that co-occur targeting a specific victim. Third, we propose a DDoS inference and forecasting model that aims at providing insights to organizations, security operators and emergency response teams during and after a DDoS attack. Specifically, this work strives to predict, within minutes, the attacks’ features, namely, intensity/rate (packets/sec) and size (estimated number of compromised machines/bots). The goal is to understand the future short-term trend of the ongoing DDoS attacks in terms of those features and thus provide the capability to recognize the current as well as future similar situations and hence appropriately respond to the threat. Further, our work aims at investigating DDoS campaigns by proposing a clustering approach to infer various victims targeted by the same campaign and predicting related features. To achieve our goal, our proposed approach leverages a number of time series and fluctuation analysis techniques, statistical methods and forecasting approaches. Fourth, we propose a novel approach to infer and characterize Internet-scale DRDoS attacks by leveraging the darknet space. Complementary to the pioneer work on inferring DDoS activities using darknet, this work shows that we can extract DoS activities without relying on backscattered analysis. The aim of this work is to extract cyber security intelligence related to DRDoS activities such as intensity, rate and geographic location in addition to various network-layer and flow-based insights. To achieve this task, the proposed approach exploits certain DDoS parameters to detect the attacks and the expectation maximization and k-means clustering techniques in an attempt to identify campaigns of DRDoS attacks. Finally, we conclude this work by providing some discussions and pinpointing some future work

    Security techniques for intelligent spam sensing and anomaly detection in online social platforms

    Get PDF
    Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. The recent advances in communication and mobile technologies made it easier to access and share information for most people worldwide. Among the most powerful information spreading platforms are the Online Social Networks (OSN)s that allow Internet-connected users to share different information such as instant messages, tweets, photos, and videos. Adding to that many governmental and private institutions use the OSNs such as Twitter for official announcements. Consequently, there is a tremendous need to provide the required level of security for OSN users. However, there are many challenges due to the different protocols and variety of mobile apps used to access OSNs. Therefore, traditional security techniques fail to provide the needed security and privacy, and more intelligence is required. Computational intelligence adds high-speed computation, fault tolerance, adaptability, and error resilience when used to ensure security in OSN apps. This research provides a comprehensive related work survey and investigates the application of artificial neural networks for intrusion detection systems and spam filtering for OSNs. In addition, we use the concept of social graphs and weighted cliques in the detection of suspicious behavior of certain online groups and to prevent further planned actions such as cyber/terrorist attacks before they happen

    Security techniques for intelligent spam sensing and anomaly detection in online social platforms

    Get PDF
    Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. The recent advances in communication and mobile technologies made it easier to access and share information for most people worldwide. Among the most powerful information spreading platforms are the Online Social Networks (OSN)s that allow Internet-connected users to share different information such as instant messages, tweets, photos, and videos. Adding to that many governmental and private institutions use the OSNs such as Twitter for official announcements. Consequently, there is a tremendous need to provide the required level of security for OSN users. However, there are many challenges due to the different protocols and variety of mobile apps used to access OSNs. Therefore, traditional security techniques fail to provide the needed security and privacy, and more intelligence is required. Computational intelligence adds high-speed computation, fault tolerance, adaptability, and error resilience when used to ensure security in OSN apps. This research provides a comprehensive related work survey and investigates the application of artificial neural networks for intrusion detection systems and spam filtering for OSNs. In addition, we use the concept of social graphs and weighted cliques in the detection of suspicious behavior of certain online groups and to prevent further planned actions such as cyber/terrorist attacks before they happen

    Approaches and Techniques for Fingerprinting and Attributing Probing Activities by Observing Network Telescopes

    Get PDF
    The explosive growth, complexity, adoption and dynamism of cyberspace over the last decade has radically altered the globe. A plethora of nations have been at the very forefront of this change, fully embracing the opportunities provided by the advancements in science and technology in order to fortify the economy and to increase the productivity of everyday's life. However, the significant dependence on cyberspace has indeed brought new risks that often compromise, exploit and damage invaluable data and systems. Thus, the capability to proactively infer malicious activities is of paramount importance. In this context, generating cyber threat intelligence related to probing or scanning activities render an effective tactic to achieve the latter. In this thesis, we investigate such malicious activities, which are typically the precursors of various amplified, debilitating and disrupting cyber attacks. To achieve this task, we analyze real Internet-scale traffic targeting network telescopes or darknets, which are defined by routable, allocated yet unused Internet Protocol addresses. First, we present a comprehensive survey of the entire probing topic. Specifically, we categorize this topic by elaborating on the nature, strategies and approaches of such probing activities. Additionally, we provide the reader with a classification and an exhaustive review of various techniques that could be employed in such malicious activities. Finally, we depict a taxonomy of the current literature by focusing on distributed probing detection methods. Second, we focus on the problem of fingerprinting probing activities. To this end, we design, develop and validate approaches that can identify such activities targeting enterprise networks as well as those targeting the Internet-space. On one hand, the corporate probing detection approach uniquely exploits the information that could be leaked to the scanner, inferred from the internal network topology, to perform the detection. On the other hand, the more darknet tailored probing fingerprinting approach adopts a statistical approach to not only detect the probing activities but also identify the exact technique that was employed in the such activities. Third, for attribution purposes, we propose a correlation approach that fuses probing activities with malware samples. The approach aims at detecting whether Internet-scale machines are infected or not as well as pinpointing the exact malware type/family, if the machines were found to be compromised. To achieve the intended goals, the proposed approach initially devises a probabilistic model to filter out darknet misconfiguration traffic. Consequently, probing activities are correlated with malware samples by leveraging fuzzy hashing and entropy based techniques. To this end, we also investigate and report a rare Internet-scale probing event by proposing a multifaceted approach that correlates darknet, malware and passive dns traffic. Fourth, we focus on the problem of identifying and attributing large-scale probing campaigns, which render a new era of probing events. These are distinguished from previous probing incidents as (1) the population of the participating bots is several orders of magnitude larger, (2) the target scope is generally the entire Internet Protocol (IP) address space, and (3) the bots adopt well-orchestrated, often botmaster coordinated, stealth scan strategies that maximize targets' coverage while minimizing redundancy and overlap. To this end, we propose and validate three approaches. On one hand, two of the approaches rely on a set of behavioral analytics that aim at scrutinizing the generated traffic by the probing sources. Subsequently, they employ data mining and graph theoretic techniques to systematically cluster the probing sources into well-defined campaigns possessing similar behavioral similarity. The third approach, on the other hand, exploit time series interpolation and prediction to pinpoint orchestrated probing campaigns and to filter out non-coordinated probing flows. We conclude this thesis by highlighting some research gaps that pave the way for future work

    Distributed Mail Transfer Agent

    Get PDF
    Technological advances have provided society with the means to easily communicate through several channels, starting off in radio and television stations, moving on through E-mail and SMS, and nowadays targeting Internet surfing through channels such as Google Ads and Webpush notifications. Digital marketing has flooded these channels for product promotion and customer engaging purposes in order to provide the customers with the best the organizations have to offer. E-goi is a web platform whose main objective is to facilitate digital marketing to all its customers, ranging from SMB to Corporate/Enterprise, and aid them to strengthen their relationships with its customers through digital communication. The platform’s most widely used channel is E-mail which is responsible for about fifteen million deliveries per day. The email delivery system currently employed by E-goi is functional and fault-tolerant to a certain degree, however, it has several flaws, such as its monolithic architecture, which is responsible for high hardware usage and lack of layer centralization, and the lack of deliverability related functionalities. This thesis aims to analyze and improve the E-goi’s e-mail delivery system architecture, which represents a critical system and of most importance and value for the product and the company. Business analysis tools will be used in this analysis to prove the value created for the company and its product, aiming at maintenance and infrastructure cost reduction as well as the increment in functionalities, both of which comprise valid points for creating business value. The project main objectives comprise an extensive analysis of the currently employed solution and the context to which it belongs to, followed up by a comparative discussion of currently existent competitors and technologies which may be of aid in the development of a new solution. Moving on, the solution’s functional and non-functional requirements gathering will take place. These requirements will dictate how the solution shall be developed. A thorough analysis of the project’s value will follow, discussing which solution will bring the most value to E-goi as a product and organization. Upon deciding on the best solution, its design will be developed based on the previously gathered requirements and the best software design patterns, and will support the implementation phase which follows. Once implemented, the solution will need to surpass several defined tests and hypothesis which will ensure its performance and robustness. Finally, the conclusion will summarize all the project results and define future work for the newly created solution.O avanço tecnológico forneceu à sociedade a facilidade de comunicação através dos demais canais, começando em rádios e televisões, passando pelo E-mail e SMS, atingindo, hoje em dia, a própria navegação na Internet através dos mais diversos canais como o Google Ads e notificações Webpush. Todos estes canais de comunicação são hoje em dia usados como base da promoção, o marketing digital invadiu estes canais de maneira a conseguir alcançar os mais diversos tipos de clientes e lhes proporcionar o melhor que as organizações têm para oferecer. A E-goi é uma plataforma web que pretende facilitar o marketing digital a todos os seus clientes, desde a PME à Enterprise, e ajudá-los a fortalecer as relações com os seus clientes através de comunicação digital. O canal mais usado da plataforma é o E-mail, totalizando, hoje em dia, cerca de quinze milhões de entregas por dia. O sistema de envio de e-mails usado hoje em dia pelo produto E-goi é funcional e tolerante a falhas até um certo nível, no entanto, apresenta diversas lacunas tanto na arquitetura monolítica do mesmo, responsável por um uso de hardware elevado e falta de centralização de camadas, como em funcionalidades ligadas à entregabilidade. O presente projeto visa a análise e melhoria da arquitetura do sistema de envio de e-mails da plataforma E-goi, um sistema crítico e de alta importância e valor para a empresa. Ao longo desta análise, serão usadas ferramentas de análise de negócio para provar o valor criado para a organização e para o produto com vista à redução de custos de manutenção e infraestrutura bem como o aumento de funcionalidades, ambos pontos válidos na adição de valor organizacional. Os objetivos do projeto passarão por uma análise extensiva da solução presente e do contexto em que a mesma se insere, passando a uma comparação com soluções concorrentes e tecnologias, existentes no mercado de hoje em dia, que possam ajudar no desenvolvimento de uma nova solução. Seguir-se-á um levantamento dos requisitos, tanto funcionais como não-funcionais do sistema que ditarão os moldes sobre os quais o novo sistema deverá ser desenvolvido. Após isto, dar-se-á uma extensa análise do valor do projecto e da solução que mais valor adicionará à E-goi, quer como produto e como organização. De seguida efectuar-se-á o Design da solução com base nos requisitos definidos e nas melhores práticas de engenharia informática, design este que servirá de base à implementação que se dará de seguida e será provada através da elaboração de diversos testes que garantirão a performance, robustez e validade do sistema criado. Finalmente seguir-se-á a conclusão que visa resumir os resultados do projecto e definir trabalho futuro para a solução criada

    eXplainable data processing

    Get PDF
    Seminario realizado en U & P U Patel Department of Computer Engineering, Chandubhai S. Patel Institute of Technology, Charotar University of Science And Technology (CHARUSAT), Changa-388421, Gujarat, India 2021[EN]Deep Learning y has created many new opportunities, it has unfortunately also become a means for achieving ill-intentioned goals. Fake news, disinformation campaigns, and manipulated images and videos have plagued the internet which has had serious consequences on our society. The myriad of information available online means that it may be difficult to distinguish between true and fake news, leading many users to unknowingly share fake news, contributing to the spread of misinformation. The use of Deep Learning to create fake images and videos has become known as deepfake. This means that there are ever more effective and realistic forms of deception on the internet, making it more difficult for internet users to distinguish reality from fictio
    corecore