Search CORE

6,650 research outputs found

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

Author: He Bingsheng
Li Yuchen
Sha Mo
Tan Kian-Lee
Publication venue
Publication date: 27/06/2018
Field of study

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure

arXiv.org e-Print Archive

Scalable Analytics over Distributed Time-series Graphs using GoFFish

Author: Frincu Marc
Kumbhare Alok
Nagarkar Soonil
Prasanna Viktor
Raghavendra Cauligi
Ravi Santosh
Simmhan Yogesh
Wickramaarachchi Charith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/06/2014
Field of study

Graphs are a key form of Big Data, and performing scalable analytics over them is invaluable to many domains. As our ability to collect data grows, there is an emerging class of inter-connected data which accumulates or varies over time, and on which novel analytics - both over the network structure and across the time-variant attribute values - is necessary. We introduce the notion of time-series graph analytics and propose Gopher, a scalable programming abstraction to develop algorithms and analytics on such datasets. Our abstraction leverages a sub-graph centric programming model and extends it to the temporal dimension using an iterative BSP (Bulk Synchronous Parallel) approach. Gopher is co-designed with GoFS, a distributed storage specialized for time-series graphs, as part of the GoFFish distributed analytics platform. We examine storage optimizations for GoFS, design patterns in Gopher to leverage the distributed data layout, and evaluate the GoFFish platform using time-series graph data and applications on a commodity cluster

arXiv.org e-Print Archive

Exploring the use of time-varying graphs for modelling transit networks

Author: Cavalheri Emerson
Maduako Ikechukwu
Wachowicz Monica
Publication venue
Publication date: 16/02/2018
Field of study

The study of the dynamic relationship between topological structure of a transit network and the mobility patterns of transit vehicles on this network is critical towardsdevising smart and time-aware solutions to transit management and recommendation systems. This paper proposes a time-varying graph (TVG) to model thisrelationship. The effectiveness of this proposed model has been explored by implementing the model in Neo4j graph database using transit feeds generated by bus transit network of the City of Moncton, New Brunswick, Canada. Dynamics in this relationshipalsohave been detected using network metrics such as temporal shortest paths, degree, betweenness and PageRank centralities as well as temporal network diameter and density. Keywords: Transit Networks,Mobility Pattern,Time-Varying Graph model, Graph Databaseand Graph Analytics Keywords: Transit Networks,Mobility Pattern,Time-Varying Graph model, Graph Database and Graph Analytic

arXiv.org e-Print Archive

Recommended from our members

Design and Implementation of Small Multiples Matrix-based Visualisation to Monitor and Compare Email Socio-organisational Relationships

Author: Fadahunsi O
Sathiyanarayanan M.
Turkay C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/04/2018
Field of study

One of the fundamental organisational questions is how organisations identify anomalies, monitor and compare email communications between staff-staff or staff-clients or staff-customers relationships on a daily basis. The tenacious and substantial relationships are built by the combination of timely replies, frequent engagement and deep interaction between the individuals. To watchdog this periodically, we need an interactive visualisation tool that can help organisational analysts to reconnect some lost relationships and/or strengthen an existing relationship or in some cases identify inside persons (anomalies). From our point of view, Social Intelligence (SI) in an organisation is a combination of self-, social- and organisational-awareness that will help in managing complex socio-organisational changes and can be interpreted in terms of socio-organisational communication efficacy (that is, one's confidence in one's ability to deal with social and organisational information). We considered a case study, an Enron Organisation Email Scandal, to understand the relationships of staff during various parts of the years and we conducted a workshop study with legal experts to gain insights on how they carry out investigation/analysis with respect to email relationships. The outcomes of the workshop helped us develop a novel small multiples matrix-based visualisation in collaboration with our industrial partner, Red Sift UK, to find anomalies, monitor and compare how email relationships change over time and how it defines the meaning of socio-organisational communication efficacy

City Research Online

Scalable Positional Analysis for Studying Evolution of Nodes in Networks

Author: Gupte Pratik Vinay
Ravindran Balaraman
Publication venue
Publication date: 20/01/2015
Field of study

In social network analysis, the fundamental idea behind the notion of position is to discover actors who have similar structural signatures. Positional analysis of social networks involves partitioning the actors into disjoint sets using a notion of equivalence which captures the structure of relationships among actors. Classical approaches to Positional Analysis, such as Regular equivalence and Equitable Partitions, are too strict in grouping actors and often lead to trivial partitioning of actors in real world networks. An Epsilon Equitable Partition (EEP) of a graph, which is similar in spirit to Stochastic Blockmodels, is a useful relaxation to the notion of structural equivalence which results in meaningful partitioning of actors. In this paper we propose and implement a new scalable distributed algorithm based on MapReduce methodology to find EEP of a graph. Empirical studies on random power-law graphs show that our algorithm is highly scalable for sparse graphs, thereby giving us the ability to study positional analysis on very large scale networks. We also present the results of our algorithm on time evolving snapshots of the facebook and flickr social graphs. Results show the importance of positional analysis on large dynamic networks.Comment: Presented at the workshop on Mining Networks and Graphs: A Big Data Analytic Challenge, held in conjunction with the SIAM Data Mining (SDM) Conference in April 2014. 13 page

arXiv.org e-Print Archive

Graphlet Decomposition: Framework, Algorithms, and Applications

Author: Ahmed Nesreen K.
Duffield Nick
Neville Jennifer
Rossi Ryan A.
Willke Theodore L.
Publication venue
Publication date: 15/02/2016
Field of study

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient approach for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with millions of nodes and edges, which impedes the application of graphlets to new problems that require large-scale network analysis. To address these problems, we propose a fast, efficient, and parallel algorithm for counting graphlets of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed graphlet counting algorithms leverages a number of proven combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. On a large collection of 300+ networks from a variety of domains, our graphlet counting strategies are on average 460x faster than current methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications as we show in the experiments. To the best of our knowledge, this paper provides the largest graphlet computations to date as well as the largest systematic investigation on over 300+ networks from a variety of domains

arXiv.org e-Print Archive

Using big data for customer centric marketing

Author: Camilleri Mark Anthony
Evans Chris
Publication venue: IGI Global Publishers
Publication date: 01/01/2016
Field of study

This chapter deliberates on “big data” and provides a short overview of business intelligence and emerging analytics. It underlines the importance of data for customer-centricity in marketing. This contribution contends that businesses ought to engage in marketing automation tools and apply them to create relevant, targeted customer experiences. Today’s business increasingly rely on digital media and mobile technologies as on-demand, real-time marketing has become more personalised than ever. Therefore, companies and brands are striving to nurture fruitful and long lasting relationships with customers. In a nutshell, this chapter explains why companies should recognise the value of data analysis and mobile applications as tools that drive consumer insights and engagement. It suggests that a strategic approach to big data could drive consumer preferences and may also help to improve the organisational performance.peer-reviewe

StreamWorks - A system for Dynamic Graph Search

Author: Beus Sherman
Chin George
Choudhury Sutanay
Feo John
Holder Lawrence
Ray Abhik
Publication venue
Publication date: 11/06/2013
Field of study

Acting on time-critical events by processing ever growing social media, news or cyber data streams is a major technical challenge. Many of these data sources can be modeled as multi-relational graphs. Mining and searching for subgraph patterns in a continuous setting requires an efficient approach to incremental graph search. The goal of our work is to enable real-time search capabilities for graph databases. This demonstration will present a dynamic graph query system that leverages the structural and semantic characteristics of the underlying multi-relational graph.Comment: SIGMOD 2013: International Conference on Management of Dat

arXiv.org e-Print Archive

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

A Comparative Study of Different Approaches for Tracking Communities in Evolving Social Networks

Author: Bouguessa Mohamed
He Ziwei
Tajeuna Etienne Gael
Wang Shengrui
Publication venue
Publication date: 18/03/2019
Field of study

In real-world social networks, there is an increasing interest in tracking the evolution of groups of users and detecting the various changes they are liable to undergo. Several approaches have been proposed for this. In studying these approaches, we observed that most of them use a two-stage process. In the first stage, they run an algorithm to identify groups of users at each timestamp. In the second stage, a pair-wise comparison based on a similarity measure is employed to track groups of users and detect changes they may undergo. While the majority of existing approaches use a two-stage process, they all run different algorithms to identify communities and rely on different similarity measures to track groups of users over time. Noting that the different approaches may perform differently depending on the dynamic social network under investigation, we decided to make a high level survey of some existing tracking approaches and then do a comparative analysis of some of them. In our analysis, we compared the algorithms in two main situations:

(1)

when groups of users do not overlap and

(2)

when the groups are overlapping. The study was done on three different testbeds extracted from the DBLP, Autonomous System (AS) and Yelp datasets

arXiv.org e-Print Archive