107,993 research outputs found
Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts
Word embedding is a Natural Language Processing (NLP) technique that
automatically maps words from a vocabulary to vectors of real numbers in an
embedding space. It has been widely used in recent years to boost the
performance of a vari-ety of NLP tasks such as Named Entity Recognition,
Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such
as Word2Vec and GloVe work well when they are given a large text corpus. When
the input texts are sparse as in many specialized domains (e.g.,
cybersecurity), these methods often fail to produce high-quality vectors. In
this pa-per, we describe a novel method to train domain-specificword embeddings
from sparse texts. In addition to domain texts, our method also leverages
diverse types of domain knowledge such as domain vocabulary and semantic
relations. Specifi-cally, we first propose a general framework to encode
diverse types of domain knowledge as text annotations. Then we de-velop a novel
Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text
annotations in word em-bedding. We have evaluated our method on two
cybersecurity text corpora: a malware description corpus and a Common
Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have
demonstrated the effectiveness of our method in learning domain-specific word
embeddings
Interaction Embeddings for Prediction and Explanation in Knowledge Graphs
Knowledge graph embedding aims to learn distributed representations for
entities and relations, and is proven to be effective in many applications.
Crossover interactions --- bi-directional effects between entities and
relations --- help select related information when predicting a new triple, but
haven't been formally discussed before. In this paper, we propose CrossE, a
novel knowledge graph embedding which explicitly simulates crossover
interactions. It not only learns one general embedding for each entity and
relation as most previous methods do, but also generates multiple triple
specific embeddings for both of them, named interaction embeddings. We evaluate
embeddings on typical link prediction tasks and find that CrossE achieves
state-of-the-art results on complex and more challenging datasets. Furthermore,
we evaluate embeddings from a new perspective --- giving explanations for
predicted triples, which is important for real applications. In this work, an
explanation for a triple is regarded as a reliable closed-path between the head
and the tail entity. Compared to other baselines, we show experimentally that
CrossE, benefiting from interaction embeddings, is more capable of generating
reliable explanations to support its predictions.Comment: This paper is accepted by WSDM201
- …