915 research outputs found
Exact Single-Source SimRank Computation on Large Graphs
SimRank is a popular measurement for evaluating the node-to-node similarities
based on the graph topology. In recent years, single-source and top- SimRank
queries have received increasing attention due to their applications in web
mining, social network analysis, and spam detection. However, a fundamental
obstacle in studying SimRank has been the lack of ground truths. The only exact
algorithm, Power Method, is computationally infeasible on graphs with more than
nodes. Consequently, no existing work has evaluated the actual
trade-offs between query time and accuracy on large real-world graphs. In this
paper, we present ExactSim, the first algorithm that computes the exact
single-source and top- SimRank results on large graphs. With high
probability, this algorithm produces ground truths with a rigorous theoretical
guarantee. We conduct extensive experiments on real-world datasets to
demonstrate the efficiency of ExactSim. The results show that ExactSim provides
the ground truth for any single-source SimRank query with a precision up to 7
decimal places within a reasonable query time.Comment: ACM SIGMOD 202
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
Learning-assisted Theorem Proving with Millions of Lemmas
Large formal mathematical libraries consist of millions of atomic inference
steps that give rise to a corresponding number of proved statements (lemmas).
Analogously to the informal mathematical practice, only a tiny fraction of such
statements is named and re-used in later proofs by formal mathematicians. In
this work, we suggest and implement criteria defining the estimated usefulness
of the HOL Light lemmas for proving further theorems. We use these criteria to
mine the large inference graph of the lemmas in the HOL Light and Flyspeck
libraries, adding up to millions of the best lemmas to the pool of statements
that can be re-used in later proofs. We show that in combination with
learning-based relevance filtering, such methods significantly strengthen
automated theorem proving of new conjectures over large formal mathematical
libraries such as Flyspeck.Comment: journal version of arXiv:1310.2797 (which was submitted to LPAR
conference
Adaptive image retrieval using a graph model for semantic feature integration
The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the
retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe
how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)
Gravity-Inspired Graph Autoencoders for Directed Link Prediction
Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged
as powerful node embedding methods. In particular, graph AE and VAE were
successfully leveraged to tackle the challenging link prediction problem,
aiming at figuring out whether some pairs of nodes from a graph are connected
by unobserved edges. However, these models focus on undirected graphs and
therefore ignore the potential direction of the link, which is limiting for
numerous real-life applications. In this paper, we extend the graph AE and VAE
frameworks to address link prediction in directed graphs. We present a new
gravity-inspired decoder scheme that can effectively reconstruct directed
graphs from a node embedding. We empirically evaluate our method on three
different directed link prediction tasks, for which standard graph AE and VAE
perform poorly. We achieve competitive results on three real-world graphs,
outperforming several popular baselines.Comment: ACM International Conference on Information and Knowledge Management
(CIKM 2019
Efficient External-Memory Algorithms for Graph Mining
The explosion of big data in areas like the web and social networks has posed big challenges to research activities, including data mining, information retrieval, security etc. This dissertation focuses on a particular area, graph mining, and specifically proposes several novel algorithms to solve the problems of triangle listing and computation of neighborhood function in large-scale graphs.
We first study the classic problem of triangle listing. We generalize the existing in-memory algorithms into a single framework of 18 triangle-search techniques. We then develop a novel external-memory approach, which we call Pruned Companion Files (PCF), that supports disk operation of all 18 algorithms. When compared to state-of-the-art available implementations MGT and PDTL, PCF runs 5-10 times faster and exhibits orders of magnitude less I/O.
We next focus on I/O complexity of triangle listing. Recent work by Pagh etc. provides an appealing theoretical I/O complexity for triangle listing via graph partitioning by random coloring of nodes. Since no implementation of Pagh is available and little is known about the comparison between Pagh and PCF, we carefully implement Pagh, undertake an investigation into the properties of these algorithms, model their I/O cost, understand their shortcomings, and shed light on the conditions under which each method defeats the other. This insight leads us to develop a novel framework we call Trigon that surpasses the I/O performance of both techniques in all graphs and under all RAM conditions.
We finally turn our attention to neighborhood function. Exact computation of neighborhood function is expensive in terms of CPU and I/O cost. Previous work mostly focuses on approximations. We show that our novel techniques developed for triangle listing can also be applied to this problem. We next study an application of neighborhood function to ranking of Internet hosts. Our method computes neighborhood functions for each host as an indication of its reputation. The evaluation shows that our method is robust to ranking manipulation and brings less spam to its top ranking list compared to PageRank and TrustRank
Recommended from our members
Unconventional computing platforms and nature-inspired methods for solving hard optimisation problems
The search for novel hardware beyond the traditional von Neumann architecture has given rise to a modern area of unconventional computing requiring the efforts of mathematicians, physicists and engineers. Many analogue physical systems, including networks of nonlinear oscillators, lasers, condensates, and superconducting qubits, are proposed and realised to address challenging computational problems from various areas of social and physical sciences and technology. Understanding the underlying physical process by which the system finds the solutions to such problems often leads to new optimisation algorithms. This thesis focuses on studying gain-dissipative systems and nature-inspired algorithms that form a hybrid architecture that may soon rival classical hardware.
Chapter 1 lays the necessary foundation and explains various interdisciplinary terms that are used throughout the dissertation. In particular, connections between the optimisation problems and spin Hamiltonians are established, their computational complexity classes are explained, and the most prominent physical platforms for spin Hamiltonian implementation are reviewed.
Chapter 2 demonstrates a large variety of behaviours encapsulated in networks of polariton condensates, which are a vivid example of a gain-dissipative system we use throughout the thesis. We explain how the variations of experimentally tunable parameters allow the networks of polariton condensates to represent different oscillator models. We derive analytic expressions for the interactions between two spatially separated polariton condensates and show various synchronisation regimes for periodic chains of condensates. An odd number of condensates at the vertices of a regular polygon leads to a spontaneous formation of a giant multiply-quantised vortex at the centre of a polygon. Numerical simulations of all studied configurations of polariton condensates are performed with a mean-field approach with some theoretically proposed physical phenomena supported by the relevant experiments.
Chapter 3 examines the potential of polariton graphs to find the low-energy minima of the spin Hamiltonians. By associating a spin with a condensate phase, the minima of the XY model are achieved for simple configurations of spatially-interacting polariton condensates. We argue that such implementation of gain-dissipative simulators limits their applicability to the classes of easily solvable problems since the parameters of a particular Hamiltonian depend on the node occupancies that are not known a priori. To overcome this difficulty, we propose to adjust pumping intensities and coupling strengths dynamically. We further theoretically suggest how the discrete Ising and -state planar Potts models with or without external fields can be simulated using gain-dissipative platforms. The underlying operational principle originates from a combination of resonant and non-resonant pumping. Spatial anisotropy of pump and dissipation profiles enables an effective control of the sign and intensity of the coupling strength between any two neighbouring sites, which we demonstrate with a two dimensional square lattice of polariton condensates. For an accurate minimisation of discrete and continuous spin Hamiltonians, we propose a fully controllable polaritonic XY-Ising machine based on a network of geometrically isolated polariton condensates.
In Chapter 4, we look at classical computing rivals and study nature-inspired methods for optimising spin Hamiltonians. Based on the operational principles of gain-dissipative machines, we develop a novel class of gain-dissipative algorithms for the optimisation of discrete and continuous problems and show its performance in comparison with traditional optimisation techniques. Besides looking at traditional heuristic methods for Ising minimisation, such as the Hopfield-Tank neural networks and parallel tempering, we consider a recent physics-inspired algorithm, namely chaotic amplitude control, and exact commercial solver, Gurobi. For a proper evaluation of physical simulators, we further discuss the importance of detecting easy instances of hard combinatorial optimisation problems. The Ising model for certain interaction matrices, that are commonly used for evaluating the performance of unconventional computing machines and assumed to be exponentially hard, is shown to be solvable in polynomial time including the Mobius ladder graphs and Mattis spin glasses.
In Chapter 5 we discuss possible future applications of unconventional computing platforms including emulation of search algorithms such as PageRank, realisation of a proof-of-work protocol for blockchain technology, and reservoir computing
- …