9 research outputs found

    From coincidence to purposeful flow? properties of transcendental information cascades

    Get PDF
    In this paper, we investigate a method for constructing cascades of information co-occurrence, which is suitable to trace emergent structures in information in scenarios where rich contextual features are unavailable. Our method relies only on the temporal order of content-sharing activities, and intrinsic properties of the shared content itself. We apply this method to analyse information dissemination patterns across the active online citizen science project Planet Hunters, a part of the Zooniverse platform. Our results lend insight into both structural and informational properties of different types of identifiers that can be used and combined to construct cascades. In particular, significant differences are found in the structural properties of information cascades when hashtags as used as cascade identifiers, compared with other content features. We also explain apparent local information losses in cascades in terms of information obsolescence and cascade divergence; e.g., when a cascade branches into multiple, divergent cascades with combined capacity equal to the original

    Learning and Controlling Network Diffusion in Dependent Cascade Models

    Get PDF
    Abstract—Diffusion processes have increasingly been used to represent flow of ideas, traffic and diseases in networks. Learning and controlling the diffusion dynamics through management actions has been studied extensively in the context of independent cascade models, where diffusion on outgoing edges from a node are independent of each other. Our work, in contrast, addresses (a) learning diffusion dynamics parameters and (b) taking management actions to alter the diffusion dynamics to achieve a desired outcome in dependent cascade models. A key characteristic of such dependent cascade models is the flow preservation at all nodes in the network. For example, traffic and people flow is preserved at each network node. As a case study, we address learning visitor mobility pattern at a theme park based on observed historical wait times at individual attractions, and use the learned model to plan management actions that reduce wait time at attractions. We test on real-world data from a theme park in Singapore and show that our learning approach can achieve an accuracy close to 80 % for popular attractions, and the decision support algorithm can provide about 10-20 % reduction in wait time. I

    Efficient Online Summarization of Large-Scale Dynamic Networks

    Get PDF

    Low latency data retrieval solutions for big data

    Get PDF
    As applications are moving towards peta and exascale data sets, it has become increasingly important to develop more efficient data retrieval and storage mechanisms that will aid in reducing network traffic, server load, as well as minimizing user perceived retrieval delays. We propose an Intelligent Caching technique and a Graph Summarization technique in order to achieve low latency data retrieval for big data based applications. Our caching approach is developed on top of HDFS to optimize the read latency of HDFS. HDFS is primarily suitable for Write Once Read Many (WORM) applications where the number of reads is significantly more than that of writes. In our Intelligent Caching approach, we analyze real world map reduce traces from Facebook and Yahoo in terms of file size and access pattern distribution. We combine it with the existing analysis from literature to develop a new caching algorithm that builds on top of the HDFS caching API recently released. Based on the findings that a majority of accesses in a map reduce cluster occur within the first 2 hours of file creation, our caching algorithm uses a sliding window approach to ensure that most popular files remain in cache at appropriate time instances. It uses file characteristics for a particular window to determine a file's popularity. File popularity is calculated using file access patterns, file age and workload characteristics. We use a simulator based technique to evaluate our algorithm on various performance metrics by using real world and synthetic traces. We have compared our algorithm with some of the existing variants of LRU/LFU. Recent rapid growth in real-world social networks has incentivized researchers to explore optimizations which can provide quick insights about the network. Due to this motivation, graph summarization and approximation has been an important research problem. Most of the work in this area has been focused on concise and informative representations of large graph. These large graphs are billion nodes and edges graphs and need a distributed storage/processing system for any kind of operations on them. Our work primarily focuses on task-based summarization of large graphs that are stored in a distributed fashion and answer queries which are computationally expensive on original graph, but have tolerance with regards to minor errors in exact results. These queries, semantically, provide the same amount of information even with approximate results. Our contribution is a distributed framework which can answer queries probabilistically in a highly efficient way using compact representations of original graph stored in form of summary graphs across a cluster of multiple nodes. These summary graphs are also optimized for space complexity, and only grow in terms of the number of attributes used to answer the query. One can then use a combination of these graphs to answer complex queries in an extremely efficient manner. Our results are promising and show that significant gains in runtime can be achieved using our framework without sacrificing too much on accuracy. In fact, we observe decreasing trend in error as the graph size increases
    corecore