1,191 research outputs found

    Graph Summarization

    Full text link
    The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

    Pairwise Quantization

    Get PDF
    We consider the task of lossy compression of high-dimensional vectors through quantization. We propose the approach that learns quantization parameters by minimizing the distortion of scalar products and squared distances between pairs of points. This is in contrast to previous works that obtain these parameters through the minimization of the reconstruction error of individual points. The proposed approach proceeds by finding a linear transformation of the data that effectively reduces the minimization of the pairwise distortions to the minimization of individual reconstruction errors. After such transformation, any of the previously-proposed quantization approaches can be used. Despite the simplicity of this transformation, the experiments demonstrate that it achieves considerable reduction of the pairwise distortions compared to applying quantization directly to the untransformed data

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    Dynamic Agent Compression

    Get PDF
    We introduce a new method for processing agents in agent-based models that significantly improves the efficiency of certain models. Dynamic Agent Compression allows agents to shift in and out of a compressed state based on their changing levels of heterogeneity. Sets of homogeneous agents are stored in compact bins, making the model more efficient in its use of memory and computational cycles. Modelers can use this increased efficiency to speed up the execution times, to conserve memory, or to scale up the complexity or number of agents in their simulations. We describe in detail an implementation of Dynamic Agent Compression that is lossless, i.e., no model detail is discarded during the compression process. We also contrast lossless compression to lossy compression, which promises greater efficiency gains yet may introduce artifacts in model behavior. The advantages outweigh the overhead of Dynamic Agent Compression in models where agents are unevenly heterogeneous — where a set of highly heterogeneous agents are intermixed with numerous other agents that fall into broad internally homogeneous categories. Dynamic Agent Compression is not appropriate in models with few, exclusively complex, agents.Agent-Based Modeling, Scaling, Homogeneity, Compression
    • …
    corecore