3,130 research outputs found
Mining Threat Intelligence about Open-Source Projects and Libraries from Code Repository Issues and Bug Reports
Open-Source Projects and Libraries are being used in software development
while also bearing multiple security vulnerabilities. This use of third party
ecosystem creates a new kind of attack surface for a product in development. An
intelligent attacker can attack a product by exploiting one of the
vulnerabilities present in linked projects and libraries.
In this paper, we mine threat intelligence about open source projects and
libraries from bugs and issues reported on public code repositories. We also
track library and project dependencies for installed software on a client
machine. We represent and store this threat intelligence, along with the
software dependencies in a security knowledge graph. Security analysts and
developers can then query and receive alerts from the knowledge graph if any
threat intelligence is found about linked libraries and projects, utilized in
their products
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201
Knowledge mining of unstructured information: application to cyber-domain
Information on cyber-related crimes, incidents, and conflicts is abundantly
available in numerous open online sources. However, processing the large
volumes and streams of data is a challenging task for the analysts and experts,
and entails the need for newer methods and techniques. In this article we
present and implement a novel knowledge graph and knowledge mining framework
for extracting the relevant information from free-form text about incidents in
the cyberdomain. The framework includes a machine learning based pipeline for
generating graphs of organizations, countries, industries, products and
attackers with a non-technical cyber-ontology. The extracted knowledge graph is
utilized to estimate the incidence of cyberattacks on a given graph
configuration. We use publicly available collections of real cyber-incident
reports to test the efficacy of our methods. The knowledge extraction is found
to be sufficiently accurate, and the graph-based threat estimation demonstrates
a level of correlation with the actual records of attacks. In practical use, an
analyst utilizing the presented framework can infer additional information from
the current cyber-landscape in terms of risk to various entities and
propagation of the risk heuristic between industries and countries
Recognizing and Extracting Cybersecurtity-relevant Entities from Text
Cyber Threat Intelligence (CTI) is information describing threat vectors,
vulnerabilities, and attacks and is often used as training data for AI-based
cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a
strong need to develop community-accessible datasets to train existing AI-based
cybersecurity pipelines to efficiently and accurately extract meaningful
insights from CTI. We have created an initial unstructured CTI corpus from a
variety of open sources that we are using to train and test cybersecurity
entity models using the spaCy framework and exploring self-learning methods to
automatically recognize cybersecurity entities. We also describe methods to
apply cybersecurity domain entity linking with existing world knowledge from
Wikidata. Our future work will survey and test spaCy NLP tools and create
methods for continuous integration of new information extracted from text
Towards a relation extraction framework for cyber-security concepts
In order to assist security analysts in obtaining information pertaining to
their network, such as novel vulnerabilities, exploits, or patches, information
retrieval methods tailored to the security domain are needed. As labeled text
data is scarce and expensive, we follow developments in semi-supervised Natural
Language Processing and implement a bootstrapping algorithm for extracting
security entities and their relationships from text. The algorithm requires
little input data, specifically, a few relations or patterns (heuristics for
identifying relations), and incorporates an active learning component which
queries the user on the most important decisions to prevent drifting from the
desired relations. Preliminary testing on a small corpus shows promising
results, obtaining precision of .82.Comment: 4 pages in Cyber & Information Security Research Conference 2015, AC
NLP-Based Techniques for Cyber Threat Intelligence
In the digital era, threat actors employ sophisticated techniques for which,
often, digital traces in the form of textual data are available. Cyber Threat
Intelligence~(CTI) is related to all the solutions inherent to data collection,
processing, and analysis useful to understand a threat actor's targets and
attack behavior. Currently, CTI is assuming an always more crucial role in
identifying and mitigating threats and enabling proactive defense strategies.
In this context, NLP, an artificial intelligence branch, has emerged as a
powerful tool for enhancing threat intelligence capabilities. This survey paper
provides a comprehensive overview of NLP-based techniques applied in the
context of threat intelligence. It begins by describing the foundational
definitions and principles of CTI as a major tool for safeguarding digital
assets. It then undertakes a thorough examination of NLP-based techniques for
CTI data crawling from Web sources, CTI data analysis, Relation Extraction from
cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
Finally, the challenges and limitations of NLP in threat intelligence are
exhaustively examined, including data quality issues and ethical
considerations. This survey draws a complete framework and serves as a valuable
resource for security professionals and researchers seeking to understand the
state-of-the-art NLP-based threat intelligence techniques and their potential
impact on cybersecurity
- …