1,491 research outputs found

    Scaling Up Network Analysis and Mining: Statistical Sampling, Estimation, and Pattern Discovery

    Get PDF
    Network analysis and graph mining play a prominent role in providing insights and studying phenomena across various domains, including social, behavioral, biological, transportation, communication, and financial domains. Across all these domains, networks arise as a natural and rich representation for data. Studying these real-world networks is crucial for solving numerous problems that lead to high-impact applications. For example, identifying the behavior and interests of users in online social networks (e.g., viral marketing), monitoring and detecting virus outbreaks in human contact networks, predicting protein functions in biological networks, and detecting anomalous behavior in computer networks. A key characteristic of these networks is that their complex structure is massive and continuously evolving over time, which makes it challenging and computationally intensive to analyze, query, and model these networks in their entirety. In this dissertation, we propose sampling as well as fast, efficient, and scalable methods for network analysis and mining in both static and streaming graphs

    CLEF 2017 NewsREEL Overview: Offline and Online Evaluation of Stream-based News Recommender Systems

    Get PDF
    The CLEF NewsREEL challenge allows researchers to evaluate news recommendation algorithms both online (NewsREEL Live) and offline (News- REEL Replay). Compared with the previous year NewsREEL challenged participants with a higher volume of messages and new news portals. In the 2017 edition of the CLEF NewsREEL challenge a wide variety of new approaches have been implemented ranging from the use of existing machine learning frameworks, to ensemble methods to the use of deep neural networks. This paper gives an overview over the implemented approaches and discusses the evaluation results. In addition, the main results of Living Lab and the Replay task are explained

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    Network-aware recommendations in online social networks

    Get PDF
    Along with the rapid increase of using social networks sites such as Twitter, a massive number of tweets published every day which generally affect the users decision to forward what they receive of information, and result in making them feel overwhelmed with this information. Then, it is important for this services to help the users not lose their focus from what is close to their interests, and to find potentially interesting tweets. The problem that can occur in this case is called information overload, where an individual will encounter too much information in a short time period. For instance, in Twitter, the user can see a large number of tweets posted by her followees. To sort out this issue, recommender systems are used to give contents that match the user's needs. This thesis presents a tweet-recommendation approach aiming at proposing novel tweets to users and achieving improvement over baseline. For this reason, we propose to exploit network, content, and retweet analyses for making recommendations of tweets. The main objective of this research is to recommend tweets that are unseen by the user (i.e., they do not appear in the user timeline) because nobody in her social circles published or retweeted them. To achieve this goal, we create the user's ego-network up to depth two and apply the transitivity property of the \emph{friends-of-friends} relationship to determine interesting recommendations. After this step, we apply cosine similarity and Jaccard distance as similarity measures for the candidate tweets obtained from the network analysis using bigrams. We also count the mutual retweets between the ego user and candidate users as a measure of shared similar tastes. The values of these features are compared together for each of the candidate tweets using pairwise comparisons in order to determine interesting recommendations that are ranked to best match the user's interests. Experimental results demonstrate through a real user study that our approach improves the state-of-the-art technique. In addition to the efficiency of our approach in finding relevant contents, it is also characterized by the fact of providing novel tweets, which solves the over-specialization challenge or serendipity problem that appears when using content-based recommender systems as a stand alone approach of recommendation

    Graphlet based network analysis

    Get PDF
    The majority of the existing works on network analysis, study properties that are related to the global topology of a network. Examples of such properties include diameter, power-law exponent, and spectra of graph Laplacians. Such works enhance our understanding of real-life networks, or enable us to generate synthetic graphs with real-life graph properties. However, many of the existing problems on networks require the study of local topological structures of a network. Graphlets which are induced small subgraphs capture the local topological structure of a network effectively. They are becoming increasingly popular for characterizing large networks in recent years. Graphlet based network analysis can vary based on the types of topological structures considered and the kinds of analysis tasks. For example, one of the most popular and early graphlet analyses is based on triples (triangles or paths of length two). Graphlet analysis based on cycles and cliques are also explored in several recent works. Another more comprehensive class of graphlet analysis methods works with graphlets of specific sizes—graphlets with three, four or five nodes ({3, 4, 5}-Graphlets) are particularly popular. For all the above analysis tasks, excessive computational cost is a major challenge, which becomes severe for analyzing large networks with millions of vertices. To overcome this challenge, effective methodologies are in urgent need. Furthermore, the existence of efficient methods for graphlet analysis will encourage more works broadening the scope of graphlet analysis. For graphlet counting, we propose edge iteration based methods (ExactTC and ExactGC) for efficiently computing triple and graphlet counts. The proposed methods compute local graphlet statistics in the neighborhood of each edge in the network and then aggregate the local statistics to give the global characterization (transitivity, graphlet frequency distribution (GFD), etc) of the network. Scalability of the proposed methods is further improved by iterating over a sampled set of edges and estimating the triangle count (ApproxTC) and graphlet count (Graft) by approximate rescaling of the aggregated statistics. The independence of local feature vector construction corresponding to each edge makes the methods embarrassingly parallelizable. We show this by giving a parallel edge iteration method ParApproxTC for triangle counting. For graphlet sampling, we propose Markov Chain Monte Carlo (MCMC) sampling based methods for triple and graphlet analysis. Proposed triple analysis methods, Vertex-MCMC and Triple-MCMC, estimate triangle count and network transitivity. Vertex-MCMC samples triples in two steps. First, the method selects a node (using the MCMC method) with probability proportional to the number of triples of which the node is a center. Then Vertex-MCMC samples uniformly from the triples centered by the selected node. The method Triple-MCMC samples triples by performing a MCMC walk in a triple sample space. Triple sample space consists of all the possible triples in a network. MCMC method performs triple sampling by walking form one triple to one of its neighboring triples in the triple space. We design the triple space in such a way that two triples are neighbors only if they share exactly two nodes. The proposed triple sampling algorithms Vertex-MCMC and Triple-MCMC are able to sample triples from any arbitrary distribution, as long as the weight of each triple is locally computable. The proposed methods are able to sample triples without the knowledge of the complete network structure. Information regarding only the local neighborhood structure of currently observed node or triple are enough to walk to the next node or triple. This gives the proposed methods a significant advantage: the capability to sample triples from networks that have restricted access, on which a direct sampling based method is simply not applicable. The proposed methods are also suitable for dynamic and large networks. Similar to the concept of Triple-MCMC, we propose Guise for sampling graphlets of sizes three, four and five ({3, 4, 5}-Graphlets). Guise samples graphlets, by performing a MCMC walk on a graphlet sample space, containing all the graphlets of sizes three, four and five in the network. Despite the proven utility of graphlets in static network analysis, works harnessing the ability of graphlets for dynamic network analysis are yet to come. Dynamic networks contain additional time information for their edges. With time, the topological structure of a dynamic network changes—edges can appear, disappear and reappear over time. In this direction, predicting the link state of a network at a future time, given a collection of link states at earlier times, is an important task with many real-life applications. In the existing literature, this task is known as link prediction in dynamic networks. Performing this task is more difficult than its counterpart in static networks because an effective feature representation of node-pair instances for the case of a dynamic network is hard to obtain. We design a novel graphlet transition based feature embedding for node-pair instances of a dynamic network. Our proposed method GraTFEL, uses automatic feature learning methodologies on such graphlet transition based features to give a low-dimensional feature embedding of unlabeled node-pair instances. The feature learning task is modeled as an optimal coding task where the objective is to minimize the reconstruction error. GraTFEL solves this optimization task by using a gradient descent method. We validate the effectiveness of the learned optimal feature embedding by utilizing it for link prediction in real-life dynamic networks. Specifically, we show that GraTFEL, which uses the extracted feature embedding of graphlet transition events, outperforms existing methods that use well-known link prediction features

    Towards Performance Portable Graph Algorithms

    Get PDF
    In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D

    Interactive Video Game Content Authoring using Procedural Methods

    Get PDF
    This thesis explores avenues for improving the quality and detail of game graphics, in the context of constraints that are common to most game development studios. The research begins by identifying two dominant constraints; limitations in the capacity of target gaming hardware/platforms, and processes that hinder the productivity of game art/content creation. From these constraints, themes were derived which directed the research‟s focus. These include the use of algorithmic or „procedural‟ methods in the creation of graphics content for games, and the use of an „interactive‟ content creation strategy, to better facilitate artist production workflow. Interactive workflow represents an emerging paradigm shift in content creation processes used by the industry, which directly integrates game rendering technology into the content authoring process. The primary motivation for this is to provide „high frequency‟ visual feedback that enables artists to see games content in context, during the authoring process. By merging these themes, this research develops a production strategy that takes advantage of „high frequency feedback‟ in an interactive workflow, to directly expose procedural methods to artists‟, for use in the content creation process. Procedural methods have a characteristically small „memory footprint‟ and are capable of generating massive volumes of data. Their small „size to data volume‟ ratio makes them particularly well suited for use in game rendering situations, where capacity constraints are an issue. In addition, an interactive authoring environment is well suited to the task of setting parameters for procedural methods, reducing a major barrier to their acceptance by artists. An interactive content authoring environment was developed during this research. Two algorithms were designed and implemented. These algorithms provide artists‟ with abstract mechanisms which accelerate common game content development processes; namely object placement in game environments, and the delivery of variation between similar game objects. In keeping with the theme of this research, the core functionality of these algorithms is delivered via procedural methods. Through this, production overhead that is associated with these content development processes is essentially offloaded from artists onto the processing capability of modern gaming hardware. This research shows how procedurally based content authoring algorithms not only harmonize with the issues of hardware capacity constraints, but also make the authoring of larger and more detailed volumes of games content more feasible in the game production process. Algorithms and ideas developed during this research demonstrate the use of procedurally based, interactive content creation, towards improving detail and complexity in the graphics of games

    Big Data Analytics in Static and Streaming Provenance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making
    corecore