Search CORE

226,343 research outputs found

Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts

Author: Pan SHimei
Park Youngja
Roy Arpita
Publication venue
Publication date: 05/07/2017
Field of study

Word embedding is a Natural Language Processing (NLP) technique that automatically maps words from a vocabulary to vectors of real numbers in an embedding space. It has been widely used in recent years to boost the performance of a vari-ety of NLP tasks such as Named Entity Recognition, Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such as Word2Vec and GloVe work well when they are given a large text corpus. When the input texts are sparse as in many specialized domains (e.g., cybersecurity), these methods often fail to produce high-quality vectors. In this pa-per, we describe a novel method to train domain-specificword embeddings from sparse texts. In addition to domain texts, our method also leverages diverse types of domain knowledge such as domain vocabulary and semantic relations. Specifi-cally, we first propose a general framework to encode diverse types of domain knowledge as text annotations. Then we de-velop a novel Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text annotations in word em-bedding. We have evaluated our method on two cybersecurity text corpora: a malware description corpus and a Common Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have demonstrated the effectiveness of our method in learning domain-specific word embeddings

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Recommended from our members

Conceptual situation spaces for situation-driven processes

Author: Dietze Stefan
Domingue John
Gugliotta Alessio
Publication venue
Publication date: 01/01/2008
Field of study

Open Research Online (The Open University)

An information retrieval approach to ontology mapping

Author: Gulla J.A.
Su X.
Publication venue: Elsevier
Publication date: 01/01/2006
Field of study

In this paper, we present a heuristic mapping method and a prototype mapping system that support the process of semi-automatic ontology mapping for the purpose of improving semantic interoperability in heterogeneous systems. The approach is based on the idea of semantic enrichment, i.e., using instance information of the ontology to enrich the original ontology and calculate similarities between concepts in two ontologies. The functional settings for the mapping system are discussed and the evaluation of the prototype implementation of the approach is reported. \ud \u

CiteSeerX

University of Twente Research Information

Learning Knowledge-Level Domain Dynamics

Author: Mourao Kira
Petrick Ron
Publication venue
Publication date: 01/06/2013
Field of study

Edinburgh Research Explorer