10,016 research outputs found
Characterizing Phishing Threats with Natural Language Processing
Spear phishing is a widespread concern in the modern network security
landscape, but there are few metrics that measure the extent to which
reconnaissance is performed on phishing targets. Spear phishing emails closely
match the expectations of the recipient, based on details of their experiences
and interests, making them a popular propagation vector for harmful malware. In
this work we use Natural Language Processing techniques to investigate a
specific real-world phishing campaign and quantify attributes that indicate a
targeted spear phishing attack. Our phishing campaign data sample comprises 596
emails - all containing a web bug and a Curriculum Vitae (CV) PDF attachment -
sent to our institution by a foreign IP space. The campaign was found to
exclusively target specific demographics within our institution. Performing a
semantic similarity analysis between the senders' CV attachments and the
recipients' LinkedIn profiles, we conclude with high statistical certainty (p
) that the attachments contain targeted rather than randomly
selected material. Latent Semantic Analysis further demonstrates that
individuals who were a primary focus of the campaign received CVs that are
highly topically clustered. These findings differentiate this campaign from one
that leverages random spam.Comment: This paper has been accepted for publication by the IEEE Conference
on Communications and Network Security in September 2015 at Florence, Italy.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
How are topics born? Understanding the research dynamics preceding the emergence of new areas
The ability to promptly recognise new research trends is strategic for many stake- holders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings (i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, (ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also (iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise
- …