32,079 research outputs found
ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks
Despite the fact that cyberattacks are constantly growing in complexity, the
research community still lacks effective tools to easily monitor and understand
them. In particular, there is a need for techniques that are able to not only
track how prominently certain malicious actions, such as the exploitation of
specific vulnerabilities, are exploited in the wild, but also (and more
importantly) how these malicious actions factor in as attack steps in more
complex cyberattacks. In this paper we present ATTACK2VEC, a system that uses
temporal word embeddings to model how attack steps are exploited in the wild,
and track how they evolve. We test ATTACK2VEC on a dataset of billions of
security events collected from the customers of a commercial Intrusion
Prevention System over a period of two years, and show that our approach is
effective in monitoring the emergence of new attack strategies in the wild and
in flagging which attack steps are often used together by attackers (e.g.,
vulnerabilities that are frequently exploited together). ATTACK2VEC provides a
useful tool for researchers and practitioners to better understand cyberattacks
and their evolution, and use this knowledge to improve situational awareness
and develop proactive defenses
Company2Vec -- German Company Embeddings based on Corporate Websites
With Company2Vec, the paper proposes a novel application in representation
learning. The model analyzes business activities from unstructured company
website data using Word2Vec and dimensionality reduction. Company2Vec maintains
semantic language structures and thus creates efficient company embeddings in
fine-granular industries. These semantic embeddings can be used for various
applications in banking. Direct relations between companies and words allow
semantic business analytics (e.g. top-n words for a company). Furthermore,
industry prediction is presented as a supervised learning application and
evaluation method. The vectorized structure of the embeddings allows measuring
companies similarities with the cosine distance. Company2Vec hence offers a
more fine-grained comparison of companies than the standard industry labels
(NACE). This property is relevant for unsupervised learning tasks, such as
clustering. An alternative industry segmentation is shown with k-means
clustering on the company embeddings. Finally, this paper proposes three
algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric
peer-firm identification.Comment: Accepted for Publication in: International Journal of Information
Technology & Decision Making (2023
Implicit learning of recursive context-free grammars
Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning
experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have
not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing
features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured
the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both
distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes
even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between
individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for
tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex
context-free structures, which model some features of natural languages. They support the relevance of artificial grammar
learning for probing mechanisms of language learning and challenge existing theories and computational models of
implicit learning
- …