Search CORE

3,130 research outputs found

Mining Threat Intelligence about Open-Source Projects and Libraries from Code Repository Issues and Bug Reports

Author: Joshi Anupam
Mittal Sudip
Neil Lorenzo
Publication venue
Publication date: 09/08/2018
Field of study

Open-Source Projects and Libraries are being used in software development while also bearing multiple security vulnerabilities. This use of third party ecosystem creates a new kind of attack surface for a product in development. An intelligent attacker can attack a product by exploiting one of the vulnerabilities present in linked projects and libraries. In this paper, we mine threat intelligence about open source projects and libraries from bugs and issues reported on public code repositories. We also track library and project dependencies for installed software on a client machine. We represent and store this threat intelligence, along with the software dependencies in a security knowledge graph. Security analysts and developers can then query and receive alerts from the knowledge graph if any threat intelligence is found about linked libraries and projects, utilized in their products

arXiv.org e-Print Archive

Crossref

Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

Author: Becker Hila
Flora
Ji Heng
Khandpur Rupinder P.
Lee Wenke
Li Frank
Liu Yang
Modi A.
Muthiah Sathappan
Ovelgonne Michael
Rehurek Radim
Sabottke Carl
Soska Kyle
Tanev Hristo
Weller-Fahy David J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2017
Field of study

Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201

arXiv.org e-Print Archive

Crossref

Knowledge mining of unstructured information: application to cyber-domain

Author: Bhattacharya Kunal
Cederberg Aapo
Jalasvirta Pertti
Kaski Kimmo
Lehto Martti
Takko Tuomas
Publication venue
Publication date: 01/08/2022
Field of study

Information on cyber-related crimes, incidents, and conflicts is abundantly available in numerous open online sources. However, processing the large volumes and streams of data is a challenging task for the analysts and experts, and entails the need for newer methods and techniques. In this article we present and implement a novel knowledge graph and knowledge mining framework for extracting the relevant information from free-form text about incidents in the cyberdomain. The framework includes a machine learning based pipeline for generating graphs of organizations, countries, industries, products and attackers with a non-technical cyber-ontology. The extracted knowledge graph is utilized to estimate the incidence of cyberattacks on a given graph configuration. We use publicly available collections of real cyber-incident reports to test the efficacy of our methods. The knowledge extraction is found to be sufficiently accurate, and the graph-based threat estimation demonstrates a level of correlation with the actual records of attacks. In practical use, an analyst utilizing the presented framework can infer additional information from the current cyber-landscape in terms of risk to various entities and propagation of the risk heuristic between industries and countries

arXiv.org e-Print Archive

Jyväskylä University Digital Archive

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Recognizing and Extracting Cybersecurtity-relevant Entities from Text

Author: Finin Tim
Hanks Casey
Joshi Anupam
Maiden Michael
Ranade Priyanka
Publication venue
Publication date: 02/08/2022
Field of study

Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text

arXiv.org e-Print Archive

Towards a relation extraction framework for cyber-security concepts

Author: Bridges R. A.
Brin S.
Carlson A.
de Lacalle O. L.
Jones R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/04/2015
Field of study

In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.Comment: 4 pages in Cyber & Information Security Research Conference 2015, AC

arXiv.org e-Print Archive

Crossref

NLP-Based Techniques for Cyber Threat Intelligence

Author: A. Rafidha Rehiman K.
Arazzi Marco
Arikkat Dincy R.
Conti Mauro
Nicolazzo Serena
Nocera Antonino
P. Vinod
Publication venue
Publication date: 15/11/2023
Field of study

In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

arXiv.org e-Print Archive