Search CORE

120 research outputs found

Massively Parallel Single-Source SimRanks in $o(\log n)$ Rounds

Author: Luo Siqiang
Zhu Zulun
Publication venue
Publication date: 08/04/2023
Field of study

SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data management tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node

s

and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied. In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems such as MapReduce, Hadoop, or Spark. Existing distributed SimRank algorithms enforce either

\Omega(\log n)

communication round complexity or

\Omega(n)

machine space for a graph of

n

nodes. We overcome this barrier. Particularly, given a graph of

n

nodes, for any query node

v

and constant error

\epsilon>\frac{3}{n}

, we show that using

O(\log^2 \log n)

rounds of communication among machines is almost enough to compute single-source SimRank values with at most

\epsilon

absolute errors, while each machine only needs a space sub-linear to

n

. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the

\Theta(\log n)

round complexity barrier with provable result accuracy

arXiv.org e-Print Archive

Learning-Based Approaches for Graph Problems: A Survey

Author: Luo Siqiang
Yow Kai Siong
Publication venue
Publication date: 17/04/2022
Field of study

Over the years, many graph problems specifically those in NP-complete are studied by a wide range of researchers. Some famous examples include graph colouring, travelling salesman problem and subgraph isomorphism. Most of these problems are typically addressed by exact algorithms, approximate algorithms and heuristics. There are however some drawback for each of these methods. Recent studies have employed learning-based frameworks such as machine learning techniques in solving these problems, given that they are useful in discovering new patterns in structured data that can be represented using graphs. This research direction has successfully attracted a considerable amount of attention. In this survey, we provide a systematic review mainly on classic graph problems in which learning-based approaches have been proposed in addressing the problems. We discuss the overview of each framework, and provide analyses based on the design and performance of the framework. Some potential research questions are also suggested. Ultimately, this survey gives a clearer insight and can be used as a stepping stone to the research community in studying problems in this field.Comment: v1: 41 pages; v2: 40 page

arXiv.org e-Print Archive

Analysis of 50 accidents in atmospheric storage tanks for hazardous chemicals

Author: CHEN Siqiang
Publication venue: Editorial Office of Occupational Health and Emergency Rescue
Publication date: 01/10/2023
Field of study

Directory of Open Access Journals

Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads

Author: Chen Fanchao
Luo Siqiang
Mo Dingheng
Shan Caihua
Publication venue
Publication date: 14/08/2023
Field of study

LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.Comment: 25 pages, 13 figure

arXiv.org e-Print Archive

ESSAYS ON SOVEREIGN DEFAULT AND HOUSEHOLD PORTFOLIO CHOICE

Author: Yang Siqiang
Publication venue
Publication date: 20/06/2019
Field of study

This dissertation analyzes portfolio choice problems in different contexts. In the first chapter, “Nominal Exchange Rate Volatility, Default Risk and Reserve Accumulation,” I investigate how nominal exchange rate volatility affects a sovereign's portfolio choice between how much debt to acquire and how much reserves to accumulate. First, I document a positive correlation between nominal exchange rate volatility and sovereign default risk and show that this relationship becomes stronger when more of the external debt is denominated in foreign currency. Then, I build a sovereign default model to rationalize these findings and to quantify the channels that contribute to the large reserve holdings among emerging countries. In the second chapter, “Household Portfolio Accounting,” we document and analyze the substantial heterogeneity in household portfolio composition in the United States. We consider a standard life-cycle model with labor income risk and portfolio choice, augmented with a savings wedge that lowers the return on saving, and a risky wedge that lowers the relative return on risky assets. Using U.S. survey data (2004-2016), we compute the household-level wedges that rationalize the data. The chapter has two main contributions: first, it uses the wedges to guide plausible frictions that researchers should consider in their models. Second, it analyzes the extent to which household characteristics can account for the wedges

D-Scholarship@Pitt

DMCS : Density Modularity based Community Search

Author: Cong Gao
Kim Junghoon
Luo Siqiang
Yu Wenyuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/04/2022
Field of study

Community Search, or finding a connected subgraph (known as a community) containing the given query nodes in a social network, is a fundamental problem. Most of the existing community search models only focus on the internal cohesiveness of a community. However, a high-quality community often has high modularity, which means dense connections inside communities and sparse connections to the nodes outside the community. In this paper, we conduct a pioneer study on searching a community with high modularity. We point out that while modularity has been popularly used in community detection (without query nodes), it has not been adopted for community search, surprisingly, and its application in community search (related to query nodes) brings in new challenges. We address these challenges by designing a new graph modularity function named Density Modularity. To the best of our knowledge, this is the first work on the community search problem using graph modularity. The community search based on the density modularity, termed as DMCS, is to find a community in a social network that contains all the query nodes and has high density-modularity. We prove that the DMCS problem is NP-hard. To efficiently address DMCS, we present new algorithms that run in log-linear time to the graph size. We conduct extensive experimental studies in real-world and synthetic networks, which offer insights into the efficiency and effectiveness of our algorithms. In particular, our algorithm achieves up to 8.5 times higher accuracy in terms of NMI than baseline algorithms

arXiv.org e-Print Archive

ScholarWorks@UNIST

SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization

Author: Li Xiang
Liao Ningyi
Luo Siqiang
Mo Dingheng
Yin Pengcheng
Publication venue
Publication date: 19/07/2022
Field of study

Recent advances in data processing have stimulated the demand for learning graphs of very large scales. Graph Neural Networks (GNNs), being an emerging and powerful approach in solving graph learning tasks, are known to be difficult to scale up. Most scalable models apply node-based techniques in simplifying the expensive graph message-passing propagation procedure of GNN. However, we find such acceleration insufficient when applied to million- or even billion-scale graphs. In this work, we propose SCARA, a scalable GNN with feature-oriented optimization for graph computation. SCARA efficiently computes graph embedding from node features, and further selects and reuses feature computation results to reduce overhead. Theoretical analysis indicates that our model achieves sub-linear time complexity with a guaranteed precision in propagation process as well as GNN training and inference. We conduct extensive experiments on various datasets to evaluate the efficacy and efficiency of SCARA. Performance comparison with baselines shows that SCARA can reach up to 100x graph propagation acceleration than current state-of-the-art methods with fast convergence and comparable accuracy. Most notably, it is efficient to process precomputation on the largest available billion-scale GNN dataset Papers100M (111M nodes, 1.6B edges) in 100 seconds

arXiv.org e-Print Archive

Unsupervised detection of botnet activities using frequent pattern tree mining

Author: Baldi Simone
Hao Siqiang
Liu Di
Yu Wenwu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2022
Field of study

A botnet is a network of remotely-controlled infected computers that can send spam, spread viruses, or stage denial-of-serviceattacks, without the consent of the computer owners. Since the beginning of the 21st century, botnet activities have steadilyincreased, becoming one of the major concerns for Internet security. In fact, botnet activities are becoming more and moredifficult to be detected, because they make use of Peer-to-Peer protocols (eMule, Torrent, Frostwire, Vuze, Skype and manyothers). To improve the detectability of botnet activities, this paper introduces the idea of association analysis in the field ofdata mining, and proposes a system to detect botnets based on the FP-growth (Frequent Pattern Tree) frequent item miningalgorithm. The detection system is composed of three parts: packet collection processing, rule mining, and statistical analysisof rules. Its characteristic feature is the rule-based classification of different botnet behaviors in a fast and unsupervisedfashion. The effectiveness of the approach is validated in a scenario with 11 Peer-to-Peer host PCs, 42063 Non-Peer-to-Peerhost PCs, and 17 host PCs with three different botnet activities (Storm, Waledac and Zeus). The recognition accuracy of theproposed architecture is shown to be above 94%. The proposed method is shown to improve the results reported in literature

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen