120 research outputs found

    Massively Parallel Single-Source SimRanks in o(logn)o(\log n) Rounds

    Full text link
    SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data management tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node ss and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied. In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems such as MapReduce, Hadoop, or Spark. Existing distributed SimRank algorithms enforce either Ω(logn)\Omega(\log n) communication round complexity or Ω(n)\Omega(n) machine space for a graph of nn nodes. We overcome this barrier. Particularly, given a graph of nn nodes, for any query node vv and constant error ϵ>3n\epsilon>\frac{3}{n}, we show that using O(log2logn)O(\log^2 \log n) rounds of communication among machines is almost enough to compute single-source SimRank values with at most ϵ\epsilon absolute errors, while each machine only needs a space sub-linear to nn. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the Θ(logn)\Theta(\log n) round complexity barrier with provable result accuracy

    Learning-Based Approaches for Graph Problems: A Survey

    Full text link
    Over the years, many graph problems specifically those in NP-complete are studied by a wide range of researchers. Some famous examples include graph colouring, travelling salesman problem and subgraph isomorphism. Most of these problems are typically addressed by exact algorithms, approximate algorithms and heuristics. There are however some drawback for each of these methods. Recent studies have employed learning-based frameworks such as machine learning techniques in solving these problems, given that they are useful in discovering new patterns in structured data that can be represented using graphs. This research direction has successfully attracted a considerable amount of attention. In this survey, we provide a systematic review mainly on classic graph problems in which learning-based approaches have been proposed in addressing the problems. We discuss the overview of each framework, and provide analyses based on the design and performance of the framework. Some potential research questions are also suggested. Ultimately, this survey gives a clearer insight and can be used as a stepping stone to the research community in studying problems in this field.Comment: v1: 41 pages; v2: 40 page

    Analysis of 50 accidents in atmospheric storage tanks for hazardous chemicals

    Get PDF

    Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads

    Full text link
    LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.Comment: 25 pages, 13 figure

    ESSAYS ON SOVEREIGN DEFAULT AND HOUSEHOLD PORTFOLIO CHOICE

    Get PDF
    This dissertation analyzes portfolio choice problems in different contexts. In the first chapter, “Nominal Exchange Rate Volatility, Default Risk and Reserve Accumulation,” I investigate how nominal exchange rate volatility affects a sovereign's portfolio choice between how much debt to acquire and how much reserves to accumulate. First, I document a positive correlation between nominal exchange rate volatility and sovereign default risk and show that this relationship becomes stronger when more of the external debt is denominated in foreign currency. Then, I build a sovereign default model to rationalize these findings and to quantify the channels that contribute to the large reserve holdings among emerging countries. In the second chapter, “Household Portfolio Accounting,” we document and analyze the substantial heterogeneity in household portfolio composition in the United States. We consider a standard life-cycle model with labor income risk and portfolio choice, augmented with a savings wedge that lowers the return on saving, and a risky wedge that lowers the relative return on risky assets. Using U.S. survey data (2004-2016), we compute the household-level wedges that rationalize the data. The chapter has two main contributions: first, it uses the wedges to guide plausible frictions that researchers should consider in their models. Second, it analyzes the extent to which household characteristics can account for the wedges

    DMCS : Density Modularity based Community Search

    Full text link
    Community Search, or finding a connected subgraph (known as a community) containing the given query nodes in a social network, is a fundamental problem. Most of the existing community search models only focus on the internal cohesiveness of a community. However, a high-quality community often has high modularity, which means dense connections inside communities and sparse connections to the nodes outside the community. In this paper, we conduct a pioneer study on searching a community with high modularity. We point out that while modularity has been popularly used in community detection (without query nodes), it has not been adopted for community search, surprisingly, and its application in community search (related to query nodes) brings in new challenges. We address these challenges by designing a new graph modularity function named Density Modularity. To the best of our knowledge, this is the first work on the community search problem using graph modularity. The community search based on the density modularity, termed as DMCS, is to find a community in a social network that contains all the query nodes and has high density-modularity. We prove that the DMCS problem is NP-hard. To efficiently address DMCS, we present new algorithms that run in log-linear time to the graph size. We conduct extensive experimental studies in real-world and synthetic networks, which offer insights into the efficiency and effectiveness of our algorithms. In particular, our algorithm achieves up to 8.5 times higher accuracy in terms of NMI than baseline algorithms

    SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization

    Full text link
    Recent advances in data processing have stimulated the demand for learning graphs of very large scales. Graph Neural Networks (GNNs), being an emerging and powerful approach in solving graph learning tasks, are known to be difficult to scale up. Most scalable models apply node-based techniques in simplifying the expensive graph message-passing propagation procedure of GNN. However, we find such acceleration insufficient when applied to million- or even billion-scale graphs. In this work, we propose SCARA, a scalable GNN with feature-oriented optimization for graph computation. SCARA efficiently computes graph embedding from node features, and further selects and reuses feature computation results to reduce overhead. Theoretical analysis indicates that our model achieves sub-linear time complexity with a guaranteed precision in propagation process as well as GNN training and inference. We conduct extensive experiments on various datasets to evaluate the efficacy and efficiency of SCARA. Performance comparison with baselines shows that SCARA can reach up to 100x graph propagation acceleration than current state-of-the-art methods with fast convergence and comparable accuracy. Most notably, it is efficient to process precomputation on the largest available billion-scale GNN dataset Papers100M (111M nodes, 1.6B edges) in 100 seconds

    Unsupervised detection of botnet activities using frequent pattern tree mining

    Get PDF
    A botnet is a network of remotely-controlled infected computers that can send spam, spread viruses, or stage denial-of-serviceattacks, without the consent of the computer owners. Since the beginning of the 21st century, botnet activities have steadilyincreased, becoming one of the major concerns for Internet security. In fact, botnet activities are becoming more and moredifficult to be detected, because they make use of Peer-to-Peer protocols (eMule, Torrent, Frostwire, Vuze, Skype and manyothers). To improve the detectability of botnet activities, this paper introduces the idea of association analysis in the field ofdata mining, and proposes a system to detect botnets based on the FP-growth (Frequent Pattern Tree) frequent item miningalgorithm. The detection system is composed of three parts: packet collection processing, rule mining, and statistical analysisof rules. Its characteristic feature is the rule-based classification of different botnet behaviors in a fast and unsupervisedfashion. The effectiveness of the approach is validated in a scenario with 11 Peer-to-Peer host PCs, 42063 Non-Peer-to-Peerhost PCs, and 17 host PCs with three different botnet activities (Storm, Waledac and Zeus). The recognition accuracy of theproposed architecture is shown to be above 94%. The proposed method is shown to improve the results reported in literature
    corecore