1,326 research outputs found

    Streaming Similarity Self-Join

    Full text link
    We introduce and study the problem of computing the similarity self-join in a streaming context (SSSJ), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose similarity is greater than a given threshold. The simplest formulation of the problem requires unbounded memory, and thus, it is intractable. To make the problem feasible, we introduce the notion of time-dependent similarity: the similarity of two items decreases with the difference in their arrival time. By leveraging the properties of this time-dependent similarity function, we design two algorithmic frameworks to solve the sssj problem. The first one, MiniBatch (MB), uses existing index-based filtering techniques for the static version of the problem, and combines them in a pipeline. The second framework, Streaming (STR), adds time filtering to the existing indexes, and integrates new time-based bounds deeply in the working of the algorithms. We also introduce a new indexing technique (L2), which is based on an existing state-of-the-art indexing technique (L2AP), but is optimized for the streaming case. Extensive experiments show that the STR algorithm, when instantiated with the L2 index, is the most scalable option across a wide array of datasets and parameters

    Scalable Online Betweenness Centrality in Evolving Graphs

    Full text link
    Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in O(nm) time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up to date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.Comment: 15 pages, 9 Figures, accepted for publication in IEEE Transactions on Knowledge and Data Engineerin

    Scalable Facility Location for Massive Graphs on Pregel-like Systems

    Full text link
    We propose a new scalable algorithm for facility location. Facility location is a classic problem, where the goal is to select a subset of facilities to open, from a set of candidate facilities F , in order to serve a set of clients C. The objective is to minimize the total cost of opening facilities plus the cost of serving each client from the facility it is assigned to. In this work, we are interested in the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on the graph. This setting allows to model natural problems arising in the Web and in social media applications. It also allows to leverage the inherent sparsity of such graphs, as the input is much smaller than the full pairwise distances between all vertices. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of shared-nothing machines. In particular, we target modern Pregel-like architectures, and we implement our algorithm on Apache Giraph. Our solution makes use of a recent result to build sketches for massive graphs, and of a fast parallel algorithm to find maximal independent sets, as building blocks. In so doing, we show how these problems can be solved on a Pregel-like architecture, and we investigate the properties of these algorithms. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while obtaining values of the objective function that are competitive with a state-of-the-art sequential algorithm

    The Olynthus mill in the Alps: New hypotheses from two unidentified millstones discovered in Veneto region (Italy)

    Get PDF
    The archaeological collection at the Museum of Feltre (province of Belluno, Veneto region, Italy) includes fragments of two ancient millstones of type known as “Olynthus mill” or “hopper rubber”. The first one (from San Donato, in the municipality of Lamón) is mentioned in a number of published and unpublished works; the other (generally from Feltre) is new to archaeological literature. Until now, they had never been identified as specimens of the Olynthus mill. Following a brief introduction on this type of device (its technical features, origin and geographic distribution) and the main hypotheses concerning its diffusion in the Alps, the first part of this paper describes the two stones from Feltre: their dimensions, morphological features, raw material, etc. Consequently, this article will focus on the topographical areas where the stones were found and on their importance for understanding the diffusion of the Olynthus mill model in the Alpine region characterised by Raetic culture, which is still an unresolved problem. The sites of discovery of the two Olynthus mills (along with the places of origin of the other hopper rubbers found in the Veneto region and in the eastern part of the province of Trento) could suggest new working hypotheses about the provenance of this type of millstone and its introduction into the Raetic territory between 5th and 4th century BCE. More specifically, the Olynthus mill model might have been introduced into the Alps through the Piave and Brenta valleys and not the Adige valley as previously thought; the Olynthian-type mills from the Veneto region could therefore mark the stages of this south-north path rather than being mere outlying specimens of the Raetic area, or items exported from there

    The Olynthus mill in the Alps: New hypotheses from two unidentified millstones discovered in Veneto region (Italy)

    Get PDF
    The archaeological collection at the Museum of Feltre (province of Belluno, Veneto region, Italy) includes fragments of two ancient millstones of type known as “Olynthus mill” or “hopper rubber”. The first one (from San Donato, in the municipality of Lamón) is mentioned in a number of published and unpublished works; the other (generally from Feltre) is new to archaeological literature. Until now, they had never been identified as specimens of the Olynthus mill. Following a brief introduction on this type of device (its technical features, origin and geographic distribution) and the main hypotheses concerning its diffusion in the Alps, the first part of this paper describes the two stones from Feltre: their dimensions, morphological features, raw material, etc. Consequently, this article will focus on the topographical areas where the stones were found and on their importance for understanding the diffusion of the Olynthus mill model in the Alpine region characterised by Raetic culture, which is still an unresolved problem. The sites of discovery of the two Olynthus mills (along with the places of origin of the other hopper rubbers found in the Veneto region and in the eastern part of the province of Trento) could suggest new working hypotheses about the provenance of this type of millstone and its introduction into the Raetic territory between 5th and 4th century BCE. More specifically, the Olynthus mill model might have been introduced into the Alps through the Piave and Brenta valleys and not the Adige valley as previously thought; the Olynthian-type mills from the Veneto region could therefore mark the stages of this south-north path rather than being mere outlying specimens of the Raetic area, or items exported from there

    The Effect of Collective Attention on Controversial Debates on Social Media

    Full text link
    We study the evolution of long-lived controversial debates as manifested on Twitter from 2011 to 2016. Specifically, we explore how the structure of interactions and content of discussion varies with the level of collective attention, as evidenced by the number of users discussing a topic. Spikes in the volume of users typically correspond to external events that increase the public attention on the topic -- as, for instance, discussions about `gun control' often erupt after a mass shooting. This work is the first to study the dynamic evolution of polarized online debates at such scale. By employing a wide array of network and content analysis measures, we find consistent evidence that increased collective attention is associated with increased network polarization and network concentration within each side of the debate; and overall more uniform lexicon usage across all users.Comment: accepted at ACM WebScience 201

    The Ebb and Flow of Controversial Debates on Social Media

    Full text link
    We explore how the polarization around controversial topics evolves on Twitter - over a long period of time (2011 to 2016), and also as a response to major external events that lead to increased related activity. We find that increased activity is typically associated with increased polarization; however, we find no consistent long-term trend in polarization over time among the topics we study.Comment: Accepted as a short paper at ICWSM 2017. Please cite the ICWSM version and not the ArXiv versio
    • …
    corecore