3,328 research outputs found
Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts
Word embedding is a Natural Language Processing (NLP) technique that
automatically maps words from a vocabulary to vectors of real numbers in an
embedding space. It has been widely used in recent years to boost the
performance of a vari-ety of NLP tasks such as Named Entity Recognition,
Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such
as Word2Vec and GloVe work well when they are given a large text corpus. When
the input texts are sparse as in many specialized domains (e.g.,
cybersecurity), these methods often fail to produce high-quality vectors. In
this pa-per, we describe a novel method to train domain-specificword embeddings
from sparse texts. In addition to domain texts, our method also leverages
diverse types of domain knowledge such as domain vocabulary and semantic
relations. Specifi-cally, we first propose a general framework to encode
diverse types of domain knowledge as text annotations. Then we de-velop a novel
Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text
annotations in word em-bedding. We have evaluated our method on two
cybersecurity text corpora: a malware description corpus and a Common
Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have
demonstrated the effectiveness of our method in learning domain-specific word
embeddings
BowTie - A deep learning feedforward neural network for sentiment analysis
How to model and encode the semantics of human-written text and select the
type of neural network to process it are not settled issues in sentiment
analysis. Accuracy and transferability are critical issues in machine learning
in general. These properties are closely related to the loss estimates for the
trained model. I present a computationally-efficient and accurate feedforward
neural network for sentiment prediction capable of maintaining low losses. When
coupled with an effective semantics model of the text, it provides highly
accurate models with low losses. Experimental results on representative
benchmark datasets and comparisons to other methods show the advantages of the
new approach.Comment: 12 pages, 7 figures, 4 table
Cybersecurity knowledge graphs
Cybersecurity knowledge graphs, which represent cyber-knowledge with a graph-based data model, provide holistic approaches for processing massive volumes of complex cybersecurity data derived from diverse sources. They can assist security analysts to obtain cyberthreat intelligence, achieve a high level of cyber-situational awareness, discover new cyber-knowledge, visualize networks, data flow, and attack paths, and understand data correlations by aggregating and fusing data. This paper reviews the most prominent graph-based data models used in this domain, along with knowledge organization systems that define concepts and properties utilized in formal cyber-knowledge representation for both background knowledge and specific expert knowledge about an actual system or attack. It is also discussed how cybersecurity knowledge graphs enable machine learning and facilitate automated reasoning over cyber-knowledge
Studying Ransomware Attacks Using Web Search Logs
Cyber attacks are increasingly becoming prevalent and causing significant
damage to individuals, businesses and even countries. In particular, ransomware
attacks have grown significantly over the last decade. We do the first study on
mining insights about ransomware attacks by analyzing query logs from Bing web
search engine. We first extract ransomware related queries and then build a
machine learning model to identify queries where users are seeking support for
ransomware attacks. We show that user search behavior and characteristics are
correlated with ransomware attacks. We also analyse trends in the temporal and
geographical space and validate our findings against publicly available
information. Lastly, we do a case study on 'Nemty', a popular ransomware, to
show that it is possible to derive accurate insights about cyber attacks by
query log analysis.Comment: To appear in the proceedings of SIGIR 202
- …