21 research outputs found

    Reasoning about Cyber Threat Actors

    Get PDF
    abstract: Reasoning about the activities of cyber threat actors is critical to defend against cyber attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult to determine who the attacker is, what the desired goals are of the attacker, and how they will carry out their attacks. These three questions essentially entail understanding the attacker’s use of deception, the capabilities available, and the intent of launching the attack. These three issues are highly inter-related. If an adversary can hide their intent, they can better deceive a defender. If an adversary’s capabilities are not well understood, then determining what their goals are becomes difficult as the defender is uncertain if they have the necessary tools to accomplish them. However, the understanding of these aspects are also mutually supportive. If we have a clear picture of capabilities, intent can better be deciphered. If we understand intent and capabilities, a defender may be able to see through deception schemes. In this dissertation, I present three pieces of work to tackle these questions to obtain a better understanding of cyber threats. First, we introduce a new reasoning framework to address deception. We evaluate the framework by building a dataset from DEFCON capture-the-flag exercise to identify the person or group responsible for a cyber attack. We demonstrate that the framework not only handles cases of deception but also provides transparent decision making in identifying the threat actor. The second task uses a cognitive learning model to determine the intent – goals of the threat actor on the target system. The third task looks at understanding the capabilities of threat actors to target systems by identifying at-risk systems from hacker discussions on darkweb websites. To achieve this task we gather discussions from more than 300 darkweb websites relating to malicious hacking.Dissertation/ThesisDoctoral Dissertation Computer Engineering 201

    NLP-Based Techniques for Cyber Threat Intelligence

    Full text link
    In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

    Network Infrastructures in the Dark Web

    Get PDF
    With the appearance of the Internet, open to everyone in 1991, criminals saw a big opportunity in moving their organisations to the World Wide Web, taking advantage of these infrastructures as it allowed higher mobility and scalability. Later on, in the year 2000, the first system appeared, creating what is known today as the Dark Web. This layer of the World Wide Web became quickly the option to go when criminals wanted to sell and deliver content such as match-fixing, children pornography, drugs market, guns market, etc. This obscure side of the Dark Web, makes it a relevant topic to study in order to tackle this huge network and help to identify these malicious activities and actors. In this master thesis, it is shown through the study of two datasets from the Dark Web, that we are surrounded by capable technologies that can be applied to these types of problems in order to increase our knowledge about the data and reveal interesting characteristics in an interactive and useful way. One dataset has 10 000 relations from domains living in the Dark Web, and the other dataset has thousands of data from just 11 specific domains from the Dark Web. We reveal detailed information about each dataset by applying di↵erent analysis and data mining algorithms. For the first dataset we studied domains availability patterns with temporal analysis, we categorised domains with machine learning neural networks and we reveal the network topology and nodes relevance with social networks analysis and core-periphery model. Regarding the second dataset, we created a cross matching information web graph and applied a name entity recognition algorithm which ended in a tool for identifying entities within dark web’s domains. All of these approaches culminated in an interactive web application where we publicly not only display the entire research but also the tools developed along with the project (https://darkor.org).Com o surgimento da Internet, aberta a todos em 1991, os criminosos viram uma grande oportunidade em passar as suas organizações para a World Wide Web, aproveitando-se assim dessas infraestruturas que permitiam uma maior mobilidade e escalabilidade. Mais tarde, no ano 2000, surgiu o primeiro sistema, criando o que hoje é conhecido como a Dark Web. Essa camada da World Wide Web tornou-se rapidamente a opção a seguir quando os criminosos queriam vender e entregar conteúdo como combinação de resultados, pornografia infantil, mercado de drogas, mercado de armas, etc. Este lado obscuro da Dark Web, torna-a num tema relevante de estudo a fim de ajudar a identificar atividades e atores maliciosos. Nesta dissertação de mestrado é mostrado, através do estudo de dois conjuntos de dados da Dark Web, que estamos rodeados de tecnologias que podem ser aplicadas neste tipo de problemas de forma a aumentar o nosso conhecimento sobre os dados e revelar características interessantes de forma interativa e útil. Um conjunto de dados tem 10 000 relações de domínios que vivem na Dark Web enquanto que o outro conjunto de dados tem milhares de dados de apenas 11 domínios específicos da Dark Web. Neste estudo revelamos informações detalhadas sobre cada conjunto de dados aplicando diferentes análises e algoritmos de data mining. Para o primeiro conjunto de dados, estudamos padrões de disponibilidade de domínios com análise temporal, categorizamos domínios com o auxílio de redes neuronais e revelamos a topologia da rede e a relevância dos nós com análise de redes sociais e a aplicação de um modelo núcleo-periferia. Em relação ao segundo conjunto de dados, criamos um grafo da rede com cruzamento de dados e aplicamos um algoritmo de reconhecimento de entidades que resultou em uma ferramenta para identificar entidades dentro dos domínios da Dark Web estudados. Todas estas abordagens culminaram em uma aplicação web interativa onde exibimos publicamente não apenas todo o estudo, mas também as ferramentas desenvolvidas ao longo do projeto (https://darkor.org)

    Detection of Software Vulnerability Communication in Expert Social Media Channels: A Data-driven Approach

    Get PDF
    Conceptually, a vulnerability is: A flaw or weakness in a system’s design, implementation,or operation and management that could be exploited to violate the system’s security policy .Some of these flaws can go undetected and exploited for long periods of time after soft-ware release. Although some software providers are making efforts to avoid this situ-ation, inevitability, users are still exposed to vulnerabilities that allow criminal hackersto take advantage. These vulnerabilities are constantly discussed in specialised forumson social media. Therefore, from a cyber security standpoint, the information found inthese places can be used for countermeasures actions against malicious exploitation ofsoftware. However, manual inspection of the vast quantity of shared content in socialmedia is impractical. For this reason, in this thesis, we analyse the real applicability ofsupervised classification models to automatically detect software vulnerability com-munication in expert social media channels. We cover the following three principal aspects: Firstly, we investigate the applicability of classification models in a range of 5 differ-ent datasets collected from 3 Internet Domains: Dark Web, Deep Web and SurfaceWeb. Since supervised models require labelled data, we have provided a systematiclabelling process using multiple annotators to guarantee accurate labels to carry outexperiments. Using these datasets, we have investigated the classification models withdifferent combinations of learning-based algorithms and traditional features represen-tation. Also, by oversampling the positive instances, we have achieved an increaseof 5% in Positive Recall (on average) in these models. On top of that, we have appiiplied Feature Reduction, Feature Extraction and Feature Selection techniques, whichprovided a reduction on the dimensionality of these models without damaging the accuracy, thus, providing computationally efficient models. Furthermore, in addition to traditional features representation, we have investigated the performance of robust language models, such as Word Embedding (WEMB) andSentence Embedding (SEMB) on the accuracy of classification models. RegardingWEMB, our experiment has shown that this model trained with a small security-vocabulary dataset provides comparable results with WEMB trained in a very large general-vocabulary dataset. Regarding SEMB model, our experiment has shown thatits use overcomes WEMB model in detecting vulnerability communication, recording 8% of Avg. Class Accuracy and 74% of Positive Recall. In addition, we investigate twoDeep Learning algorithms as classifiers, text CNN (Convolutional Neural Network)and RNN (Recurrent Neural Network)-based algorithms, which have improved ourmodel, resulting in the best overall performance for our task

    Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

    Get PDF
    Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    You Can Tell a Cybercriminal by the Company they Keep: A Framework to Infer the Relevance of Underground Communities to the Threat Landscape

    Full text link
    The criminal underground is populated with forum marketplaces where, allegedly, cybercriminals share and trade knowledge, skills, and cybercrime products. However, it is still unclear whether all marketplaces matter the same in the overall threat landscape. To effectively support trade and avoid degenerating into scams-for-scammers places, underground markets must address fundamental economic problems (such as moral hazard, adverse selection) that enable the exchange of actual technology and cybercrime products (as opposed to repackaged malware or years-old password databases). From the relevant literature and manual investigation, we identify several mechanisms that marketplaces implement to mitigate these problems, and we condense them into a market evaluation framework based on the Business Model Canvas. We use this framework to evaluate which mechanisms `successful' marketplaces have in place, and whether these differ from those employed by `unsuccessful' marketplaces. We test the framework on 23 underground forum markets by searching 836 aliases of indicted cybercriminals to identify `successful' marketplaces. We find evidence that marketplaces whose administrators are impartial in trade, verify their sellers, and have the right economic incentives to keep the market functional are more likely to be credible sources of threat.Comment: The 22nd Workshop on the Economics of Information Security (WEIS'23), July 05--08, 2023, Geneva, Switzerlan

    An Approach to Guide Users Towards Less Revealing Internet Browsers

    Get PDF
    When browsing the Internet, HTTP headers enable both clients and servers send extra data in their requests or responses such as the User-Agent string. This string contains information related to the sender’s device, browser, and operating system. Previous research has shown that there are numerous privacy and security risks result from exposing sensitive information in the User-Agent string. For example, it enables device and browser fingerprinting and user tracking and identification. Our large analysis of thousands of User-Agent strings shows that browsers differ tremendously in the amount of information they include in their User-Agent strings. As such, our work aims at guiding users towards using less exposing browsers. In doing so, we propose to assign an exposure score to browsers based on the information they expose and vulnerability records. Thus, our contribution in this work is as follows: first, provide a full implementation that is ready to be deployed and used by users. Second, conduct a user study to identify the effectiveness and limitations of our proposed approach. Our implementation is based on using more than 52 thousand unique browsers. Our performance and validation analysis show that our solution is accurate and efficient. The source code and data set are publicly available and the solution has been deployed
    corecore