4,005 research outputs found

    NLP-Based Techniques for Cyber Threat Intelligence

    Full text link
    In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

    iGen: Toward Automatic Generation and Analysis of Indicators of Compromise (IOCs) using Convolutional Neural Network

    Get PDF
    abstract: Field of cyber threats is evolving rapidly and every day multitude of new information about malware and Advanced Persistent Threats (APTs) is generated in the form of malware reports, blog articles, forum posts, etc. However, current Threat Intelligence (TI) systems have several limitations. First, most of the TI systems examine and interpret data manually with the help of analysts. Second, some of them generate Indicators of Compromise (IOCs) directly using regular expressions without understanding the contextual meaning of those IOCs from the data sources which allows the tools to include lot of false positives. Third, lot of TI systems consider either one or two data sources for the generation of IOCs, and misses some of the most valuable IOCs from other data sources. To overcome these limitations, we propose iGen, a novel approach to fully automate the process of IOC generation and analysis. Proposed approach is based on the idea that our model can understand English texts like human beings, and extract the IOCs from the different data sources intelligently. Identification of the IOCs is done on the basis of the syntax and semantics of the sentence as well as context words (e.g., ``attacked'', ``suspicious'') present in the sentence which helps the approach work on any kind of data source. Our proposed technique, first removes the words with no contextual meaning like stop words and punctuations etc. Then using the rest of the words in the sentence and output label (IOC or non-IOC sentence), our model intelligently learn to classify sentences into IOC and non-IOC sentences. Once IOC sentences are identified using this learned Convolutional Neural Network (CNN) based approach, next step is to identify the IOC tokens (like domains, IP, URL) in the sentences. This CNN based classification model helps in removing false positives (like IPs which are not malicious). Afterwards, IOCs extracted from different data sources are correlated to find the links between thousands of apparently unrelated attack instances, particularly infrastructures shared between them. Our approach fully automates the process of IOC generation from gathering data from different sources to creating rules (e.g. OpenIOC, snort rules, STIX rules) for deployment on the security infrastructure. iGen has collected around 400K IOCs till now with a precision of 95\%, better than any state-of-art method.Dissertation/ThesisMasters Thesis Computer Science 201

    Artificial intelligence and UK national security: Policy considerations

    Get PDF
    RUSI was commissioned by GCHQ to conduct an independent research study into the use of artificial intelligence (AI) for national security purposes. The aim of this project is to establish an independent evidence base to inform future policy development regarding national security uses of AI. The findings are based on in-depth consultation with stakeholders from across the UK national security community, law enforcement agencies, private sector companies, academic and legal experts, and civil society representatives. This was complemented by a targeted review of existing literature on the topic of AI and national security. The research has found that AI offers numerous opportunities for the UK national security community to improve efficiency and effectiveness of existing processes. AI methods can rapidly derive insights from large, disparate datasets and identify connections that would otherwise go unnoticed by human operators. However, in the context of national security and the powers given to UK intelligence agencies, use of AI could give rise to additional privacy and human rights considerations which would need to be assessed within the existing legal and regulatory framework. For this reason, enhanced policy and guidance is needed to ensure the privacy and human rights implications of national security uses of AI are reviewed on an ongoing basis as new analysis methods are applied to data

    Network entity characterization and attack prediction

    Get PDF
    The devastating effects of cyber-attacks, highlight the need for novel attack detection and prevention techniques. Over the last years, considerable work has been done in the areas of attack detection as well as in collaborative defense. However, an analysis of the state of the art suggests that many challenges exist in prioritizing alert data and in studying the relation between a recently discovered attack and the probability of it occurring again. In this article, we propose a system that is intended for characterizing network entities and the likelihood that they will behave maliciously in the future. Our system, namely Network Entity Reputation Database System (NERDS), takes into account all the available information regarding a network entity (e. g. IP address) to calculate the probability that it will act maliciously. The latter part is achieved via the utilization of machine learning. Our experimental results show that it is indeed possible to precisely estimate the probability of future attacks from each entity using information about its previous malicious behavior and other characteristics. Ranking the entities by this probability has practical applications in alert prioritization, assembly of highly effective blacklists of a limited length and other use cases.Comment: 30 pages, 8 figure

    Cybersecurity Information Exchange with Privacy (CYBEX-P) and TAHOE – A Cyberthreat Language

    Get PDF
    Cybersecurity information sharing (CIS) is envisioned to protect organizations more effectively from advanced cyberattacks. However, a completely automated CIS platform is not widely adopted. The major challenges are: (1) the absence of advanced data analytics capabilities and (2) the absence of a robust cyberthreat language (CTL). This work introduces Cybersecurity Information Exchange with Privacy (CYBEX-P), as a CIS framework, to tackle these challenges. CYBEX-P allows organizations to share heterogeneous data from various sources. It correlates the data to automatically generate intuitive reports and defensive rules. To achieve such versatility, we have developed TAHOE - a graph-based CTL. TAHOE is a structure for storing, sharing, and analyzing threat data. It also intrinsically correlates the data. We have further developed a universal Threat Data Query Language (TDQL). In this work, we propose the system architecture for CYBEX-P. We then discuss its scalability along with a protocol to correlate attributes of threat data. We further introduce TAHOE & TDQL as better alternatives to existing CTLs and formulate ThreatRank - an algorithm to detect new malicious events.We have developed CYBEX-P as a complete CIS platform for not only data sharing but also for advanced threat data analysis. To that end, we have developed two frameworks that use CYBEX-P infrastructure as a service (IaaS). The first work is a phishing URL detector that uses machine learning to detect new phishing URLs. This real-time system adapts to the ever-changing landscape of phishing URLs and maintains an accuracy of 86%. The second work models attacker behavior in a botnet. It combines heterogeneous threat data and analyses them together to predict the behavior of an attacker in a host infected by a bot malware. We have achieved a prediction accuracy of 85-97% using our methodology. These two frameworks establish the feasibility of CYBEX-P for advanced threat data analysis for future researchers

    CSM Automated Confidence Score Measurement of Threat Indicators

    Get PDF
    abstract: The volume and frequency of cyber attacks have exploded in recent years. Organizations subscribe to multiple threat intelligence feeds to increase their knowledge base and better equip their security teams with the latest information in threat intelligence domain. Though such subscriptions add intelligence and can help in taking more informed decisions, organizations have to put considerable efforts in facilitating and analyzing a large number of threat indicators. This problem worsens further, due to a large number of false positives and irrelevant events detected as threat indicators by existing threat feed sources. It is often neither practical nor cost-effective to analyze every single alert considering the staggering volume of indicators. The very reason motivates to solve the overcrowded threat indicators problem by prioritizing and filtering them. To overcome above issue, I explain the necessity of determining how likely a reported indicator is malicious given the evidence and prioritizing it based on such determination. Confidence Score Measurement system (CSM) introduces the concept of confidence score, where it assigns a score of being malicious to a threat indicator based on the evaluation of different threat intelligence systems. An indicator propagates maliciousness to adjacent indicators based on relationship determined from behavior of an indicator. The propagation algorithm derives final confidence to determine overall maliciousness of the threat indicator. CSM can prioritize the indicators based on confidence score; however, an analyst may not be interested in the entire result set, so CSM narrows down the results based on the analyst-driven input. To this end, CSM introduces the concept of relevance score, where it combines the confidence score with analyst-driven search by applying full-text search techniques. It prioritizes the results based on relevance score to provide meaningful results to the analyst. The analysis shows the propagation algorithm of CSM linearly scales with larger datasets and achieves 92% accuracy in determining threat indicators. The evaluation of the result demonstrates the effectiveness and practicality of the approach.Dissertation/ThesisMasters Thesis Computer Science 201

    Feature trade-off analysis for reconnaissance detection.

    Get PDF
    An effective cyber early warning system (CEWS) should pick up threat activity at an early stage, with an emphasis on establishing hypotheses and predictions as well as generating alerts on (unclassified) situations based on preliminary indications. The design and implementation of such early warning systems involve numerous challenges such as generic set of indicators, intelligence gathering, uncertainty reasoning and information fusion. This chapter begins with an understanding of the behaviours of intruders and then related literature is followed by the proposed methodology using a Bayesian inference-based system. It also includes a carefully deployed empirical analysis on a data set labelled for reconnaissance activity. Finally, the chapter concludes with a discussion on results, research challenges and necessary suggestions to move forward in this research line

    An artificial intelligence-based collaboration approach in industrial IoT manufacturing : key concepts, architectural extensions and potential applications

    Get PDF
    The digitization of manufacturing industry has led to leaner and more efficient production, under the Industry 4.0 concept. Nowadays, datasets collected from shop floor assets and information technology (IT) systems are used in data-driven analytics efforts to support more informed business intelligence decisions. However, these results are currently only used in isolated and dispersed parts of the production process. At the same time, full integration of artificial intelligence (AI) in all parts of manufacturing systems is currently lacking. In this context, the goal of this manuscript is to present a more holistic integration of AI by promoting collaboration. To this end, collaboration is understood as a multi-dimensional conceptual term that covers all important enablers for AI adoption in manufacturing contexts and is promoted in terms of business intelligence optimization, human-in-the-loop and secure federation across manufacturing sites. To address these challenges, the proposed architectural approach builds on three technical pillars: (1) components that extend the functionality of the existing layers in the Reference Architectural Model for Industry 4.0; (2) definition of new layers for collaboration by means of human-in-the-loop and federation; (3) security concerns with AI-powered mechanisms. In addition, system implementation aspects are discussed and potential applications in industrial environments, as well as business impacts, are presented
    • …
    corecore