Search CORE

107,993 research outputs found

Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts

Author: Pan SHimei
Park Youngja
Roy Arpita
Publication venue
Publication date: 05/07/2017
Field of study

Word embedding is a Natural Language Processing (NLP) technique that automatically maps words from a vocabulary to vectors of real numbers in an embedding space. It has been widely used in recent years to boost the performance of a vari-ety of NLP tasks such as Named Entity Recognition, Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such as Word2Vec and GloVe work well when they are given a large text corpus. When the input texts are sparse as in many specialized domains (e.g., cybersecurity), these methods often fail to produce high-quality vectors. In this pa-per, we describe a novel method to train domain-specificword embeddings from sparse texts. In addition to domain texts, our method also leverages diverse types of domain knowledge such as domain vocabulary and semantic relations. Specifi-cally, we first propose a general framework to encode diverse types of domain knowledge as text annotations. Then we de-velop a novel Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text annotations in word em-bedding. We have evaluated our method on two cybersecurity text corpora: a malware description corpus and a Common Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have demonstrated the effectiveness of our method in learning domain-specific word embeddings

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Author: Bordes Antoine
Bordes Antoine
Glorot Xavier
Ji Guoliang
Lin Yankai
Liu Hanxiao
Nickel Maximilian
Nickel Maximilian
Shi Baoxu
Srivastava Nitish
Trouillon Théo
Wang Quan
Wang Zhen
Xie Ruobing
Yang Bishan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/02/2019
Field of study

Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions --- bi-directional effects between entities and relations --- help select related information when predicting a new triple, but haven't been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective --- giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions.Comment: This paper is accepted by WSDM201

arXiv.org e-Print Archive

Crossref

ZORA