1,124 research outputs found
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
Graph Theory and Networks in Biology
In this paper, we present a survey of the use of graph theoretical techniques
in Biology. In particular, we discuss recent work on identifying and modelling
the structure of bio-molecular networks, as well as the application of
centrality measures to interaction networks and research on the hierarchical
structure of such networks and network motifs. Work on the link between
structural network properties and dynamics is also described, with emphasis on
synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape
Online Spectral Clustering on Network Streams
Graph is an extremely useful representation of a wide variety of practical systems in data analysis. Recently, with the fast accumulation of stream data from various type of networks, significant research interests have arisen on spectral clustering for network streams (or evolving networks). Compared with the general spectral clustering problem, the data analysis of this new type of problems may have additional requirements, such as short processing time, scalability in distributed computing environments, and temporal variation tracking. However, to design a spectral clustering method to satisfy these requirements certainly presents non-trivial efforts. There are three major challenges for the new algorithm design. The first challenge is online clustering computation. Most of the existing spectral methods on evolving networks are off-line methods, using standard eigensystem solvers such as the Lanczos method. It needs to recompute solutions from scratch at each time point. The second challenge is the parallelization of algorithms. To parallelize such algorithms is non-trivial since standard eigen solvers are iterative algorithms and the number of iterations can not be predetermined. The third challenge is the very limited existing work. In addition, there exists multiple limitations in the existing method, such as computational inefficiency on large similarity changes, the lack of sound theoretical basis, and the lack of effective way to handle accumulated approximate errors and large data variations over time. In this thesis, we proposed a new online spectral graph clustering approach with a family of three novel spectrum approximation algorithms. Our algorithms incrementally update the eigenpairs in an online manner to improve the computational performance. Our approaches outperformed the existing method in computational efficiency and scalability while retaining competitive or even better clustering accuracy. We derived our spectrum approximation techniques GEPT and EEPT through formal theoretical analysis. The well established matrix perturbation theory forms a solid theoretic foundation for our online clustering method. We facilitated our clustering method with a new metric to track accumulated approximation errors and measure the short-term temporal variation. The metric not only provides a balance between computational efficiency and clustering accuracy, but also offers a useful tool to adapt the online algorithm to the condition of unexpected drastic noise. In addition, we discussed our preliminary work on approximate graph mining with evolutionary process, non-stationary Bayesian Network structure learning from non-stationary time series data, and Bayesian Network structure learning with text priors imposed by non-parametric hierarchical topic modeling
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Characterizing and Detecting Unrevealed Elements of Network Systems
This dissertation addresses the problem of discovering and characterizing unknown elements in network systems. Klir (1985) provides a general definition of a system as “... a set of some things and a relation among the things (p. 4). A system, where the `things\u27, i.e. nodes, are related through links is a network system (Klir, 1985). The nodes can represent a range of entities such as machines or people (Pearl, 2001; Wasserman & Faust, 1994). Likewise, links can represent abstract relationships such as causal influence or more visible ties such as roads (Pearl, 1988, pp. 50-51; Wasserman & Faust, 1994; Winston, 1994, p. 394). It is not uncommon to have incomplete knowledge of network systems due to either passive circumstances, e.g. limited resources to observe a network, active circumstances, e.g. intentional acts of concealment, or some combination of active and passive influences (McCormick & Owen, 2000, p. 175; National Research Council, 2005, pp. 7, 11). This research provides statistical and graph theoretic approaches for such situations, including those in which nodes are causally related (Geiger & Pearl, 1990, pp. 3, 10; Glymour, Scheines, Spirtes, & Kelly, 1987, pp. 75-86, 178183; Murphy, 1998; Verma & Pearl, 1991, pp. 257, 260, 264-265). A related aspect of this research is accuracy assessment. It is possible an analyst could fail to detect a network element, or be aware of network elements, but incorrectly conclude the associated network system structure (Borgatti, Carley, & Krackhardt, 2006). The possibilities require assessment of the accuracy of the observed and conjectured network systems, and this research provides a means to do so (Cavallo & Klir, 1979, p. 143; Kelly, 1957, p. 968)
- …