6,650 research outputs found

    Technical Report: Accelerating Dynamic Graph Analytics on GPUs

    Full text link
    As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure

    Scalable Analytics over Distributed Time-series Graphs using GoFFish

    Full text link
    Graphs are a key form of Big Data, and performing scalable analytics over them is invaluable to many domains. As our ability to collect data grows, there is an emerging class of inter-connected data which accumulates or varies over time, and on which novel analytics - both over the network structure and across the time-variant attribute values - is necessary. We introduce the notion of time-series graph analytics and propose Gopher, a scalable programming abstraction to develop algorithms and analytics on such datasets. Our abstraction leverages a sub-graph centric programming model and extends it to the temporal dimension using an iterative BSP (Bulk Synchronous Parallel) approach. Gopher is co-designed with GoFS, a distributed storage specialized for time-series graphs, as part of the GoFFish distributed analytics platform. We examine storage optimizations for GoFS, design patterns in Gopher to leverage the distributed data layout, and evaluate the GoFFish platform using time-series graph data and applications on a commodity cluster

    Exploring the use of time-varying graphs for modelling transit networks

    Full text link
    The study of the dynamic relationship between topological structure of a transit network and the mobility patterns of transit vehicles on this network is critical towardsdevising smart and time-aware solutions to transit management and recommendation systems. This paper proposes a time-varying graph (TVG) to model thisrelationship. The effectiveness of this proposed model has been explored by implementing the model in Neo4j graph database using transit feeds generated by bus transit network of the City of Moncton, New Brunswick, Canada. Dynamics in this relationshipalsohave been detected using network metrics such as temporal shortest paths, degree, betweenness and PageRank centralities as well as temporal network diameter and density. Keywords: Transit Networks,Mobility Pattern,Time-Varying Graph model, Graph Databaseand Graph Analytics Keywords: Transit Networks,Mobility Pattern,Time-Varying Graph model, Graph Database and Graph Analytic

    Scalable Positional Analysis for Studying Evolution of Nodes in Networks

    Full text link
    In social network analysis, the fundamental idea behind the notion of position is to discover actors who have similar structural signatures. Positional analysis of social networks involves partitioning the actors into disjoint sets using a notion of equivalence which captures the structure of relationships among actors. Classical approaches to Positional Analysis, such as Regular equivalence and Equitable Partitions, are too strict in grouping actors and often lead to trivial partitioning of actors in real world networks. An Epsilon Equitable Partition (EEP) of a graph, which is similar in spirit to Stochastic Blockmodels, is a useful relaxation to the notion of structural equivalence which results in meaningful partitioning of actors. In this paper we propose and implement a new scalable distributed algorithm based on MapReduce methodology to find EEP of a graph. Empirical studies on random power-law graphs show that our algorithm is highly scalable for sparse graphs, thereby giving us the ability to study positional analysis on very large scale networks. We also present the results of our algorithm on time evolving snapshots of the facebook and flickr social graphs. Results show the importance of positional analysis on large dynamic networks.Comment: Presented at the workshop on Mining Networks and Graphs: A Big Data Analytic Challenge, held in conjunction with the SIAM Data Mining (SDM) Conference in April 2014. 13 page

    Graphlet Decomposition: Framework, Algorithms, and Applications

    Full text link
    From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient approach for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with millions of nodes and edges, which impedes the application of graphlets to new problems that require large-scale network analysis. To address these problems, we propose a fast, efficient, and parallel algorithm for counting graphlets of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed graphlet counting algorithms leverages a number of proven combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. On a large collection of 300+ networks from a variety of domains, our graphlet counting strategies are on average 460x faster than current methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications as we show in the experiments. To the best of our knowledge, this paper provides the largest graphlet computations to date as well as the largest systematic investigation on over 300+ networks from a variety of domains

    Using big data for customer centric marketing

    Get PDF
    This chapter deliberates on “big data” and provides a short overview of business intelligence and emerging analytics. It underlines the importance of data for customer-centricity in marketing. This contribution contends that businesses ought to engage in marketing automation tools and apply them to create relevant, targeted customer experiences. Today’s business increasingly rely on digital media and mobile technologies as on-demand, real-time marketing has become more personalised than ever. Therefore, companies and brands are striving to nurture fruitful and long lasting relationships with customers. In a nutshell, this chapter explains why companies should recognise the value of data analysis and mobile applications as tools that drive consumer insights and engagement. It suggests that a strategic approach to big data could drive consumer preferences and may also help to improve the organisational performance.peer-reviewe

    StreamWorks - A system for Dynamic Graph Search

    Full text link
    Acting on time-critical events by processing ever growing social media, news or cyber data streams is a major technical challenge. Many of these data sources can be modeled as multi-relational graphs. Mining and searching for subgraph patterns in a continuous setting requires an efficient approach to incremental graph search. The goal of our work is to enable real-time search capabilities for graph databases. This demonstration will present a dynamic graph query system that leverages the structural and semantic characteristics of the underlying multi-relational graph.Comment: SIGMOD 2013: International Conference on Management of Dat

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    A Comparative Study of Different Approaches for Tracking Communities in Evolving Social Networks

    Full text link
    In real-world social networks, there is an increasing interest in tracking the evolution of groups of users and detecting the various changes they are liable to undergo. Several approaches have been proposed for this. In studying these approaches, we observed that most of them use a two-stage process. In the first stage, they run an algorithm to identify groups of users at each timestamp. In the second stage, a pair-wise comparison based on a similarity measure is employed to track groups of users and detect changes they may undergo. While the majority of existing approaches use a two-stage process, they all run different algorithms to identify communities and rely on different similarity measures to track groups of users over time. Noting that the different approaches may perform differently depending on the dynamic social network under investigation, we decided to make a high level survey of some existing tracking approaches and then do a comparative analysis of some of them. In our analysis, we compared the algorithms in two main situations: (1)(1) when groups of users do not overlap and (2)(2) when the groups are overlapping. The study was done on three different testbeds extracted from the DBLP, Autonomous System (AS) and Yelp datasets
    corecore