Search CORE

915 research outputs found

Exact Single-Source SimRank Computation on Large Graphs

Author: Du Xiaoyong
Wang Hanzhi
Wei Zhewei
Wen Ji-Rong
Yuan Ye
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2020
Field of study

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-

k

SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than

10^6

nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-

k

SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

arXiv.org e-Print Archive

Crossref

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

Author: Buluc Aydin
Owens John D.
Yang Carl
Publication venue
Publication date: 14/11/2020
Field of study

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linear-algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first high-performance linear algebra-based graph framework on NVIDIA GPUs that is open-source. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.Comment: 50 pages, 14 figures, 14 table

arXiv.org e-Print Archive

eScholarship - University of California

Learning-assisted Theorem Proving with Millions of Lemmas

Author: Kaliszyk Cezary
Urban Josef
Publication venue
Publication date: 10/02/2014
Field of study

Large formal mathematical libraries consist of millions of atomic inference steps that give rise to a corresponding number of proved statements (lemmas). Analogously to the informal mathematical practice, only a tiny fraction of such statements is named and re-used in later proofs by formal mathematicians. In this work, we suggest and implement criteria defining the estimated usefulness of the HOL Light lemmas for proving further theorems. We use these criteria to mine the large inference graph of the lemmas in the HOL Light and Flyspeck libraries, adding up to millions of the best lemmas to the pool of statements that can be re-used in later proofs. We show that in combination with learning-based relevance filtering, such methods significantly strengthen automated theorem proving of new conjectures over large formal mathematical libraries such as Flyspeck.Comment: journal version of arXiv:1310.2797 (which was submitted to LPAR conference

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

PubMed Central

Radboud Repository

Adaptive image retrieval using a graph model for semantic feature integration

Author: Jose J.M.
Urban J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)

CiteSeerX

Crossref

Enlighten

Gravity-Inspired Graph Autoencoders for Directed Link Prediction

Author: Hennequin Romain
Limnios Stratis
Salha Guillaume
Tran Viet Anh
Vazirgiannis Michalis
Publication venue
Publication date: 05/12/2019
Field of study

Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods. In particular, graph AE and VAE were successfully leveraged to tackle the challenging link prediction problem, aiming at figuring out whether some pairs of nodes from a graph are connected by unobserved edges. However, these models focus on undirected graphs and therefore ignore the potential direction of the link, which is limiting for numerous real-life applications. In this paper, we extend the graph AE and VAE frameworks to address link prediction in directed graphs. We present a new gravity-inspired decoder scheme that can effectively reconstruct directed graphs from a node embedding. We empirically evaluate our method on three different directed link prediction tasks, for which standard graph AE and VAE perform poorly. We achieve competitive results on three real-world graphs, outperforming several popular baselines.Comment: ACM International Conference on Information and Knowledge Management (CIKM 2019

arXiv.org e-Print Archive

Efficient External-Memory Algorithms for Graph Mining

Author: Cui Yi
Publication venue
Publication date: 16/01/2019
Field of study

The explosion of big data in areas like the web and social networks has posed big challenges to research activities, including data mining, information retrieval, security etc. This dissertation focuses on a particular area, graph mining, and specifically proposes several novel algorithms to solve the problems of triangle listing and computation of neighborhood function in large-scale graphs. We first study the classic problem of triangle listing. We generalize the existing in-memory algorithms into a single framework of 18 triangle-search techniques. We then develop a novel external-memory approach, which we call Pruned Companion Files (PCF), that supports disk operation of all 18 algorithms. When compared to state-of-the-art available implementations MGT and PDTL, PCF runs 5-10 times faster and exhibits orders of magnitude less I/O. We next focus on I/O complexity of triangle listing. Recent work by Pagh etc. provides an appealing theoretical I/O complexity for triangle listing via graph partitioning by random coloring of nodes. Since no implementation of Pagh is available and little is known about the comparison between Pagh and PCF, we carefully implement Pagh, undertake an investigation into the properties of these algorithms, model their I/O cost, understand their shortcomings, and shed light on the conditions under which each method defeats the other. This insight leads us to develop a novel framework we call Trigon that surpasses the I/O performance of both techniques in all graphs and under all RAM conditions. We finally turn our attention to neighborhood function. Exact computation of neighborhood function is expensive in terms of CPU and I/O cost. Previous work mostly focuses on approximations. We show that our novel techniques developed for triangle listing can also be applied to this problem. We next study an application of neighborhood function to ranking of Internet hosts. Our method computes neighborhood functions for each host as an indication of its reputation. The evaluation shows that our method is robust to ranking manipulation and brings less spam to its top ranking list compared to PageRank and TrustRank

Texas A&M Repository

Recommended from our members

Unconventional computing platforms and nature-inspired methods for solving hard optimisation problems

Author: Kalinin Kirill
Publication venue: University of Cambridge
Publication date: 04/07/2021
Field of study

The search for novel hardware beyond the traditional von Neumann architecture has given rise to a modern area of unconventional computing requiring the efforts of mathematicians, physicists and engineers. Many analogue physical systems, including networks of nonlinear oscillators, lasers, condensates, and superconducting qubits, are proposed and realised to address challenging computational problems from various areas of social and physical sciences and technology. Understanding the underlying physical process by which the system finds the solutions to such problems often leads to new optimisation algorithms. This thesis focuses on studying gain-dissipative systems and nature-inspired algorithms that form a hybrid architecture that may soon rival classical hardware. Chapter 1 lays the necessary foundation and explains various interdisciplinary terms that are used throughout the dissertation. In particular, connections between the optimisation problems and spin Hamiltonians are established, their computational complexity classes are explained, and the most prominent physical platforms for spin Hamiltonian implementation are reviewed. Chapter 2 demonstrates a large variety of behaviours encapsulated in networks of polariton condensates, which are a vivid example of a gain-dissipative system we use throughout the thesis. We explain how the variations of experimentally tunable parameters allow the networks of polariton condensates to represent different oscillator models. We derive analytic expressions for the interactions between two spatially separated polariton condensates and show various synchronisation regimes for periodic chains of condensates. An odd number of condensates at the vertices of a regular polygon leads to a spontaneous formation of a giant multiply-quantised vortex at the centre of a polygon. Numerical simulations of all studied configurations of polariton condensates are performed with a mean-field approach with some theoretically proposed physical phenomena supported by the relevant experiments. Chapter 3 examines the potential of polariton graphs to find the low-energy minima of the spin Hamiltonians. By associating a spin with a condensate phase, the minima of the XY model are achieved for simple configurations of spatially-interacting polariton condensates. We argue that such implementation of gain-dissipative simulators limits their applicability to the classes of easily solvable problems since the parameters of a particular Hamiltonian depend on the node occupancies that are not known a priori. To overcome this difficulty, we propose to adjust pumping intensities and coupling strengths dynamically. We further theoretically suggest how the discrete Ising and

n

-state planar Potts models with or without external fields can be simulated using gain-dissipative platforms. The underlying operational principle originates from a combination of resonant and non-resonant pumping. Spatial anisotropy of pump and dissipation profiles enables an effective control of the sign and intensity of the coupling strength between any two neighbouring sites, which we demonstrate with a two dimensional square lattice of polariton condensates. For an accurate minimisation of discrete and continuous spin Hamiltonians, we propose a fully controllable polaritonic XY-Ising machine based on a network of geometrically isolated polariton condensates. In Chapter 4, we look at classical computing rivals and study nature-inspired methods for optimising spin Hamiltonians. Based on the operational principles of gain-dissipative machines, we develop a novel class of gain-dissipative algorithms for the optimisation of discrete and continuous problems and show its performance in comparison with traditional optimisation techniques. Besides looking at traditional heuristic methods for Ising minimisation, such as the Hopfield-Tank neural networks and parallel tempering, we consider a recent physics-inspired algorithm, namely chaotic amplitude control, and exact commercial solver, Gurobi. For a proper evaluation of physical simulators, we further discuss the importance of detecting easy instances of hard combinatorial optimisation problems. The Ising model for certain interaction matrices, that are commonly used for evaluating the performance of unconventional computing machines and assumed to be exponentially hard, is shown to be solvable in polynomial time including the Mobius ladder graphs and Mattis spin glasses. In Chapter 5 we discuss possible future applications of unconventional computing platforms including emulation of search algorithms such as PageRank, realisation of a proof-of-work protocol for blockchain technology, and reservoir computing

Apollo (Cambridge)