2 research outputs found

    A novel parallel distance metric-based approach for diversified ranking on large graphs

    No full text
    Diversified ranking on graphs (DRG) is an important and challenging issue in researching graph data mining. Traditionally, this problem is modeled by a submodular optimization objective, and solved by applying a cardinality constrained monotone submodular maximization. However, the existing submodular objectives do not directly capture the dis-similarity over pairs of nodes, while most of algorithms cannot easily take full advantage of the power of a distributed cluster computing platform, such as Spark, to significantly promote the efficiency of algorithms. To overcome the deficiencies of existing approaches, in this paper, a generalized distance metric based on a subadditive set function over the symmetry difference of neighbors of pairs of nodes is introduced to capture the pairwise dis-similarity over pairs of nodes. In our approach, DRG is formulated as a Max-Sum k-dispersion problem with metrical edge weights, which is NP-hard, in association with the proposed distance metric, a centralized linear time 2-approximation algorithm GA is then developed to significantly solve the problem of DRG. Moreover, we develop a highly parallelizable algorithm for DRG, which can be easily implemented in MapReduce style parallel computation models using GA as a basic reducer. Finally, extensive experiments are conducted on real network datasets to verify the effectiveness and efficiency of our proposed approaches
    corecore