6,650 research outputs found
Technical Report: Accelerating Dynamic Graph Analytics on GPUs
As graph analytics often involves compute-intensive operations, GPUs have
been extensively used to accelerate the processing. However, in many
applications such as social networks, cyber security, and fraud detection,
their representative graphs evolve frequently and one has to perform a rebuild
of the graph structure on GPUs to incorporate the updates. Hence, rebuilding
the graphs becomes the bottleneck of processing high-speed graph streams. In
this paper, we propose a GPU-based dynamic graph storage scheme to support
existing graph algorithms easily. Furthermore, we propose parallel update
algorithms to support efficient stream updates so that the maintained graph is
immediately available for high-speed analytic processing on GPUs. Our extensive
experiments with three streaming applications on large-scale real and synthetic
datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure
Scalable Analytics over Distributed Time-series Graphs using GoFFish
Graphs are a key form of Big Data, and performing scalable analytics over
them is invaluable to many domains. As our ability to collect data grows, there
is an emerging class of inter-connected data which accumulates or varies over
time, and on which novel analytics - both over the network structure and across
the time-variant attribute values - is necessary. We introduce the notion of
time-series graph analytics and propose Gopher, a scalable programming
abstraction to develop algorithms and analytics on such datasets. Our
abstraction leverages a sub-graph centric programming model and extends it to
the temporal dimension using an iterative BSP (Bulk Synchronous Parallel)
approach. Gopher is co-designed with GoFS, a distributed storage specialized
for time-series graphs, as part of the GoFFish distributed analytics platform.
We examine storage optimizations for GoFS, design patterns in Gopher to
leverage the distributed data layout, and evaluate the GoFFish platform using
time-series graph data and applications on a commodity cluster
Exploring the use of time-varying graphs for modelling transit networks
The study of the dynamic relationship between topological structure of a
transit network and the mobility patterns of transit vehicles on this network
is critical towardsdevising smart and time-aware solutions to transit
management and recommendation systems. This paper proposes a time-varying graph
(TVG) to model thisrelationship. The effectiveness of this proposed model has
been explored by implementing the model in Neo4j graph database using transit
feeds generated by bus transit network of the City of Moncton, New Brunswick,
Canada. Dynamics in this relationshipalsohave been detected using network
metrics such as temporal shortest paths, degree, betweenness and PageRank
centralities as well as temporal network diameter and density. Keywords:
Transit Networks,Mobility Pattern,Time-Varying Graph model, Graph Databaseand
Graph Analytics
Keywords: Transit Networks,Mobility Pattern,Time-Varying Graph model, Graph
Database and Graph Analytic
Recommended from our members
Design and Implementation of Small Multiples Matrix-based Visualisation to Monitor and Compare Email Socio-organisational Relationships
One of the fundamental organisational questions is how organisations identify anomalies, monitor and compare email communications between staff-staff or staff-clients or staff-customers relationships on a daily basis. The tenacious and substantial relationships are built by the combination of timely replies, frequent engagement and deep interaction between the individuals. To watchdog this periodically, we need an interactive visualisation tool that can help organisational analysts to reconnect some lost relationships and/or strengthen an existing relationship or in some cases identify inside persons (anomalies). From our point of view, Social Intelligence (SI) in an organisation is a combination of self-, social- and organisational-awareness that will help in managing complex socio-organisational changes and can be interpreted in terms of socio-organisational communication efficacy (that is, one's confidence in one's ability to deal with social and organisational information). We considered a case study, an Enron Organisation Email Scandal, to understand the relationships of staff during various parts of the years and we conducted a workshop study with legal experts to gain insights on how they carry out investigation/analysis with respect to email relationships. The outcomes of the workshop helped us develop a novel small multiples matrix-based visualisation in collaboration with our industrial partner, Red Sift UK, to find anomalies, monitor and compare how email relationships change over time and how it defines the meaning of socio-organisational communication efficacy
Scalable Positional Analysis for Studying Evolution of Nodes in Networks
In social network analysis, the fundamental idea behind the notion of
position is to discover actors who have similar structural signatures.
Positional analysis of social networks involves partitioning the actors into
disjoint sets using a notion of equivalence which captures the structure of
relationships among actors. Classical approaches to Positional Analysis, such
as Regular equivalence and Equitable Partitions, are too strict in grouping
actors and often lead to trivial partitioning of actors in real world networks.
An Epsilon Equitable Partition (EEP) of a graph, which is similar in spirit to
Stochastic Blockmodels, is a useful relaxation to the notion of structural
equivalence which results in meaningful partitioning of actors. In this paper
we propose and implement a new scalable distributed algorithm based on
MapReduce methodology to find EEP of a graph. Empirical studies on random
power-law graphs show that our algorithm is highly scalable for sparse graphs,
thereby giving us the ability to study positional analysis on very large scale
networks. We also present the results of our algorithm on time evolving
snapshots of the facebook and flickr social graphs. Results show the importance
of positional analysis on large dynamic networks.Comment: Presented at the workshop on Mining Networks and Graphs: A Big Data
Analytic Challenge, held in conjunction with the SIAM Data Mining (SDM)
Conference in April 2014. 13 page
Graphlet Decomposition: Framework, Algorithms, and Applications
From social science to biology, numerous applications often rely on graphlets
for intuitive and meaningful characterization of networks at both the global
macro-level as well as the local micro-level. While graphlets have witnessed a
tremendous success and impact in a variety of domains, there has yet to be a
fast and efficient approach for computing the frequencies of these subgraph
patterns. However, existing methods are not scalable to large networks with
millions of nodes and edges, which impedes the application of graphlets to new
problems that require large-scale network analysis. To address these problems,
we propose a fast, efficient, and parallel algorithm for counting graphlets of
size k={3,4}-nodes that take only a fraction of the time to compute when
compared with the current methods used. The proposed graphlet counting
algorithms leverages a number of proven combinatorial arguments for different
graphlets. For each edge, we count a few graphlets, and with these counts along
with the combinatorial arguments, we obtain the exact counts of others in
constant time. On a large collection of 300+ networks from a variety of
domains, our graphlet counting strategies are on average 460x faster than
current methods. This brings new opportunities to investigate the use of
graphlets on much larger networks and newer applications as we show in the
experiments. To the best of our knowledge, this paper provides the largest
graphlet computations to date as well as the largest systematic investigation
on over 300+ networks from a variety of domains
Using big data for customer centric marketing
This chapter deliberates on “big data” and provides a short overview of business intelligence and emerging analytics. It underlines the importance of data for customer-centricity in marketing. This contribution contends that businesses ought to engage in marketing automation tools and apply them to create relevant, targeted customer experiences. Today’s business increasingly rely on digital media and mobile technologies as on-demand, real-time marketing has become more personalised than ever. Therefore, companies and brands are striving to nurture fruitful and long lasting relationships with customers. In a nutshell, this chapter explains why companies should recognise the value of data analysis and mobile applications as tools that drive consumer insights and engagement. It suggests that a strategic approach to big data could drive consumer preferences and may also help to improve the organisational performance.peer-reviewe
StreamWorks - A system for Dynamic Graph Search
Acting on time-critical events by processing ever growing social media, news
or cyber data streams is a major technical challenge. Many of these data
sources can be modeled as multi-relational graphs. Mining and searching for
subgraph patterns in a continuous setting requires an efficient approach to
incremental graph search. The goal of our work is to enable real-time search
capabilities for graph databases. This demonstration will present a dynamic
graph query system that leverages the structural and semantic characteristics
of the underlying multi-relational graph.Comment: SIGMOD 2013: International Conference on Management of Dat
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
A Comparative Study of Different Approaches for Tracking Communities in Evolving Social Networks
In real-world social networks, there is an increasing interest in tracking
the evolution of groups of users and detecting the various changes they are
liable to undergo. Several approaches have been proposed for this. In studying
these approaches, we observed that most of them use a two-stage process. In the
first stage, they run an algorithm to identify groups of users at each
timestamp. In the second stage, a pair-wise comparison based on a similarity
measure is employed to track groups of users and detect changes they may
undergo. While the majority of existing approaches use a two-stage process,
they all run different algorithms to identify communities and rely on different
similarity measures to track groups of users over time. Noting that the
different approaches may perform differently depending on the dynamic social
network under investigation, we decided to make a high level survey of some
existing tracking approaches and then do a comparative analysis of some of
them. In our analysis, we compared the algorithms in two main situations:
when groups of users do not overlap and when the groups are overlapping.
The study was done on three different testbeds extracted from the DBLP,
Autonomous System (AS) and Yelp datasets
- …