Search CORE

1,017 research outputs found

Probabilistic Random Walk Models for Comparative Network Analysis

Author: Jeong Hyundoo
Publication venue
Publication date: 24/02/2020
Field of study

Graph-based systems and data analysis methods have become critical tools in many fields as they can provide an intuitive way of representing and analyzing interactions between variables. Due to the advances in measurement techniques, a massive amount of labeled data that can be represented as nodes on a graph (or network) have been archived in databases. Additionally, novel data without label information have been gradually generated and archived. Labeling and identifying characteristics of novel data is an important first step in utilizing the valuable data in an effective and meaningful way. Comparative network analysis is an effective computational means to identify and predict the properties of the unlabeled data by comparing the similarities and differences between well-studied and less-studied networks. Comparative network analysis aims to identify the matching nodes and conserved subnetworks across multiple networks to enable a prediction of the properties of the nodes in the less-studied networks based on the properties of the matching nodes in the well-studied networks (i.e., transferring knowledge between networks). One of the fundamental and important questions in comparative network analysis is how to accurately estimate node-to-node correspondence as it can be a critical clue in analyzing the similarities and differences between networks. Node correspondence is a comprehensive similarity that integrates various types of similarity measurements in a balanced manner. However, there are several challenges in accurately estimating the node correspondence for large-scale networks. First, the scale of the networks is a critical issue. As networks generally include a large number of nodes, we have to examine an extremely large space and it can pose a computational challenge due to the combinatorial nature of the problem. Furthermore, although there are matching nodes and conserved subnetworks in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity. In this dissertation, novel probabilistic random walk models are proposed to accurately estimate node-to-node correspondence between networks. First, we propose a context-sensitive random walk (CSRW) model. In the CSRW model, the random walker analyzes the context of the current position of the random walker and it can switch the random movement to either a simultaneous walk on both networks or an individual walk on one of the networks. The context-sensitive nature of the random walker enables the method to effectively integrate different types of similarities by dealing with structural variations. Second, we propose the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct an integrated network by inserting pseudo edges between potential matching nodes in different networks. Then, we design the random walk protocol to transit more frequently between potential matching nodes as their node similarity increases and they have more matching neighboring nodes. We apply the proposed random walk models to comparative network analysis problems: global network alignment and network querying. Through extensive performance evaluations, we demonstrate that the proposed random walk models can accurately estimate node correspondence and these can lead to improved and reliable network comparison results

Probabilistic Random Walk Models for Comparative Network Analysis

Author: B-J Yoon
C-S Liao
CB Do
F Alkan
H Ogata
H-M Woo
HTT Phan
J Hu
M Ashburner
N Malod-Dognin
N Mamano
O Kuchaiev
O Kuchaiev
R Durbin
R Patro
R Sharan
R Sharan
R Singh
S Hashemifar
SME Sahraeian
SME Sahraeian
SME Sahraeian
SME Sahraeian
U Roshan
V Gligorijević
V Memišević
V Saraph
V Vijayan
Publication venue
Publication date: 24/02/2020
Field of study

Crossref

Texas A&M Repository

큰 그래프 상에서의 개인화된 페이지 랭크에 대한 빠른 계산 기법

Author: 박성찬
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 이상구.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.그래프 내에서 개인화된 페이지랭크 (P ersonalized P age R ank, PPR 를 계산하는 것은 검색 , 추천 , 지식발견 등 여러 분야에서 광범위하게 활용되는 중요한 작업 이다 . 개인화된 페이지랭크를 계산하는 것은 고비용의 과정이 필요하므로 , 개인화된 페이지랭크를 계산하는 효율적이고 혁신적인 방법들이 다수 개발되어왔다 . 그러나 수백만 이상의 노드를 가진 대용량 그래프에 대한 효율적인 계산은 여전히 해결되지 않은 문제이다 . 그에 더하여 , 기존 제시된 알고리듬들은 그래프 갱신을 효율적으로 다루지 못하여 동적으로 변화하는 그래프를 다루는 데에 한계점이 크다 . 본 연구에서는 높은 정밀도를 보장하고 정밀도를 통제 가능한 , 빠르게 수렴하는 개인화된 페이지랭크 계산 알고리듬을 제시한다 . 전통적인 거듭제곱법 (Power 에 축차가속완화법 (Successive Over Relaxation) 과 초기 추측 값 보정법 (Initial Guess 을 활용한 벡터 재사용 전략을 적용하여 수렴 속도를 개선하였다 . 제시된 방법은 기존 거듭제곱법의 장점인 단순성과 엄밀성을 유지 하면서 도 수렴율과 계산속도를 크게 개선 한다 . 또한 개인화된 페이지랭크 벡터의 갱신을 위하여 이전에 계산 되어 저장된 벡터를 재사용하 여 , 갱신 에 드는 시간이 크게 단축된다 . 본 방법은 주어진 오차 한계에 도달하는 즉시 결과값을 산출하므로 정확도와 계산시간을 유연하게 조절할 수 있으며 이는 표본 기반 추정방법이나 정확한 값을 산출하는 역행렬 기반 방법 이 가지지 못한 특성이다 . 실험 결과 , 본 방법은 거듭제곱법에 비하여 20 배 이상 빠르게 수렴한다는 것이 확인되었으며 , 기 제시된 최고 성능 의 알고리 듬 보다 우수한 성능을 보이는 것 또한 확인되었다1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

SNU Open Repository and Archive

Graph based Anomaly Detection and Description: A Survey

Author: Danai Koutra
Hanghang Tong
Leman Akoglu
Publication venue
Publication date: 28/04/2014
Field of study

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

arXiv.org e-Print Archive

CiteSeerX

Accurate multiple network alignment through context-sensitive random walk

Author: Jeong Hyundoo
Yoon Byung-Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Comparative network analysis can provide an effective means of analyzing large-scale biological networks and gaining novel insights into their structure and organization. Global network alignment aims to predict the best overall mapping between a given set of biological networks, thereby identifying important similarities as well as differences among the networks. It has been shown that network alignment methods can be used to detect pathways or network modules that are conserved across different networks. Until now, a number of network alignment algorithms have been proposed based on different formulations and approaches, many of them focusing on pairwise alignment. RESULTS: In this work, we propose a novel multiple network alignment algorithm based on a context-sensitive random walk model. The random walker employed in the proposed algorithm switches between two different modes, namely, an individual walk on a single network and a simultaneous walk on two networks. The switching decision is made in a context-sensitive manner by examining the current neighborhood, which is effective for quantitatively estimating the degree of correspondence between nodes that belong to different networks, in a manner that sensibly integrates node similarity and topological similarity. The resulting node correspondence scores are then used to predict the maximum expected accuracy (MEA) alignment of the given networks. CONCLUSIONS: Performance evaluation based on synthetic networks as well as real protein-protein interaction networks shows that the proposed algorithm can construct more accurate multiple network alignments compared to other leading methods

Springer - Publisher Connector

Texas A&M Repository

PubMed Central

Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment

Author: Brignell Christopher J.
Czogiel Irina
Dryden Ian L.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/03/2012
Field of study

Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS486 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Clustering Service Networks with Entity, Attribute, and Link Heterogeneity

Author: Bao X
Lee K
Liu L
Palanisamy B
Pu C
Yigitoglu E
Zhang Q
Zhou Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/06/2015
Field of study

Many popular web service networks are content-rich in terms of heterogeneous types of entities and links, associated with incomplete attributes. Clustering such heterogeneous service networks demands new clustering techniques that can handle two heterogeneity challenges: (1) multiple types of entities co-exist in the same service network with multiple attributes, and (2) links between entities have diverse types and carry different semantics. Existing heterogeneous graph clustering techniques tend to pick initial centroids uniformly at random, specify the number k of clusters in advance, and fix k during the clustering process. In this paper, we propose Service Cluster, a novel heterogeneous service network clustering algorithm with four unique features. First, we incorporate various types of entity, attribute and link information into a unified distance measure. Second, we design a Discrete Steepest Descent method to naturally produce initial k and initial centroids simultaneously. Third, we propose a dynamic learning method to automatically adjust the link weights towards clustering convergence. Fourth, we develop an effective optimization strategy to identify new suitable k and k well-chosen centroids at each clustering iteration. Extensive evaluation on real datasets demonstrates that Service Cluster outperforms existing representative methods in terms of both effectiveness and efficiency

CiteSeerX

Crossref

D-Scholarship@Pitt

Correspondence driven saliency transfer

Author: Porikli Fatih
Shao Ling
Shen Jianbing
Wang Wenguan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/08/2016
Field of study

In this paper, we show that large annotated data sets have great potential to provide strong priors for saliency estimation rather than merely serving for benchmark evaluations. To this end, we present a novel image saliency detection method called saliency transfer. Given an input image, we first retrieve a support set of best matches from the large database of saliency annotated images. Then, we assign the transitional saliency scores by warping the support set annotations onto the input image according to computed dense correspondences. To incorporate context, we employ two complementary correspondence strategies: a global matching scheme based on scene-level analysis and a local matching scheme based on patch-level inference. We then introduce two refinement measures to further refine the saliency maps and apply the random-walk-with-restart by exploring the global saliency structure to estimate the affinity between foreground and background assignments. Extensive experimental results on four publicly available benchmark data sets demonstrate that the proposed saliency algorithm consistently outperforms the current state-of-the-art methods

Northumbria Research Link

Crossref

University of East Anglia digital repository

Simple and Efficient Local Codes for Distributed Stable Network Construction

Author: Michail Othon
Spirakis Paul G.
Publication venue
Publication date: 01/01/2014
Field of study

In this work, we study protocols so that populations of distributed processes can construct networks. In order to highlight the basic principles of distributed network construction we keep the model minimal in all respects. In particular, we assume finite-state processes that all begin from the same initial state and all execute the same protocol (i.e. the system is homogeneous). Moreover, we assume pairwise interactions between the processes that are scheduled by an adversary. The only constraint on the adversary scheduler is that it must be fair. In order to allow processes to construct networks, we let them activate and deactivate their pairwise connections. When two processes interact, the protocol takes as input the states of the processes and the state of the their connection and updates all of them. Initially all connections are inactive and the goal is for the processes, after interacting and activating/deactivating connections for a while, to end up with a desired stable network. We give protocols (optimal in some cases) and lower bounds for several basic network construction problems such as spanning line, spanning ring, spanning star, and regular network. We provide proofs of correctness for all of our protocols and analyze the expected time to convergence of most of them under a uniform random scheduler that selects the next pair of interacting processes uniformly at random from all such pairs. Finally, we prove several universality results by presenting generic protocols that are capable of simulating a Turing Machine (TM) and exploiting it in order to construct a large class of networks.Comment: 43 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref