393 research outputs found
Efficient Node Proximity and Node Significance Computations in Graphs
abstract: Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds.
In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes' fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values.Dissertation/ThesisDoctoral Dissertation Computer Science 201
A graph oriented approach for network forensic analysis
Network forensic analysis is a process that analyzes intrusion evidence captured from networked environment to identify suspicious entities and stepwise actions in an attack scenario. Unfortunately, the overwhelming amount and low quality of output from security sensors make it difficult for analysts to obtain a succinct high-level view of complex multi-stage intrusions.
This dissertation presents a novel graph based network forensic analysis system. The evidence graph model provides an intuitive representation of collected evidence as well as the foundation for forensic analysis. Based on the evidence graph, we develop a set of analysis components in a hierarchical reasoning framework. Local reasoning utilizes fuzzy inference to infer the functional states of an host level entity from its local observations. Global reasoning performs graph structure analysis to identify the set of highly correlated hosts that belong to the coordinated attack scenario. In global reasoning, we apply spectral clustering and Pagerank methods for generic and targeted investigation
respectively. An interactive hypothesis testing procedure is developed to identify hidden attackers from non-explicit-malicious evidence. Finally, we introduce the notion of target-oriented effective event sequence (TOEES) to semantically reconstruct stealthy attack scenarios with less dependency on ad-hoc expert knowledge. Well established computation methods used in our approach provide the scalability needed to perform
post-incident analysis in large networks. We evaluate the techniques with a number of intrusion detection datasets and the experiment results show that our approach is effective in identifying complex multi-stage attacks
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Web3Recommend: Decentralised recommendations with trust and relevance
Web3Recommend is a decentralized Social Recommender System implementation
that enables Web3 Platforms on Android to generate recommendations that balance
trust and relevance. Generating recommendations in decentralized networks is a
non-trivial problem because these networks lack a global perspective due to the
absence of a central authority. Further, decentralized networks are prone to
Sybil Attacks in which a single malicious user can generate multiple fake or
Sybil identities. Web3Recommend relies on a novel graph-based content
recommendation design inspired by GraphJet, a recommendation system used in
Twitter enhanced with MeritRank, a decentralized reputation scheme that
provides Sybil-resistance to the system. By adding MeritRank's decay parameters
to the vanilla Social Recommender Systems' personalized SALSA graph algorithm,
we can provide theoretical guarantees against Sybil Attacks in the generated
recommendations. Similar to GraphJet, we focus on generating real-time
recommendations by only acting on recent interactions in the social network,
allowing us to cater temporally contextual recommendations while keeping a
tight bound on the memory usage in resource-constrained devices, allowing for a
seamless user experience. As a proof-of-concept, we integrate our system with
MusicDAO, an open-source Web3 music-sharing platform, to generate personalized,
real-time recommendations. Thus, we provide the first Sybil-resistant Social
Recommender System, allowing real-time recommendations beyond classic
user-based collaborative filtering. The system is also rigorously tested with
extensive unit and integration tests. Further, our experiments demonstrate the
trust-relevance balance of recommendations against multiple adversarial
strategies in a test network generated using data from real music platforms
Semi-supervised Eigenvectors for Large-scale Locally-biased Learning
In many applications, one has side information, e.g., labels that are
provided in a semi-supervised manner, about a specific target region of a large
data set, and one wants to perform machine learning and data analysis tasks
"nearby" that prespecified target region. For example, one might be interested
in the clustering structure of a data graph near a prespecified "seed set" of
nodes, or one might be interested in finding partitions in an image that are
near a prespecified "ground truth" set of pixels. Locally-biased problems of
this sort are particularly challenging for popular eigenvector-based machine
learning and data analysis tools. At root, the reason is that eigenvectors are
inherently global quantities, thus limiting the applicability of
eigenvector-based methods in situations where one is interested in very local
properties of the data.
In this paper, we address this issue by providing a methodology to construct
semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these
locally-biased eigenvectors can be used to perform locally-biased machine
learning. These semi-supervised eigenvectors capture
successively-orthogonalized directions of maximum variance, conditioned on
being well-correlated with an input seed set of nodes that is assumed to be
provided in a semi-supervised manner. We show that these semi-supervised
eigenvectors can be computed quickly as the solution to a system of linear
equations; and we also describe several variants of our basic method that have
improved scaling properties. We provide several empirical examples
demonstrating how these semi-supervised eigenvectors can be used to perform
locally-biased learning; and we discuss the relationship between our results
and recent machine learning algorithms that use global eigenvectors of the
graph Laplacian
- …