43,016 research outputs found

    GPSP: Graph Partition and Space Projection based Approach for Heterogeneous Network Embedding

    Full text link
    In this paper, we propose GPSP, a novel Graph Partition and Space Projection based approach, to learn the representation of a heterogeneous network that consists of multiple types of nodes and links. Concretely, we first partition the heterogeneous network into homogeneous and bipartite subnetworks. Then, the projective relations hidden in bipartite subnetworks are extracted by learning the projective embedding vectors. Finally, we concatenate the projective vectors from bipartite subnetworks with the ones learned from homogeneous subnetworks to form the final representation of the heterogeneous network. Extensive experiments are conducted on a real-life dataset. The results demonstrate that GPSP outperforms the state-of-the-art baselines in two key network mining tasks: node classification and clustering.Comment: WWW 2018 Poste

    A Universal Similarity Model for Transactional Data Clustering

    Get PDF
    Data mining methods are used to extract hidden knowledge from large database. Data partitioning methods are used to group up the relevant data values. Similar data values are grouped under the same cluster. K - means and Partitioning Around Medoids (PAM ) clustering algorithms are used to cluster numerical data. Distance measures are used to estimate the transaction similarity. Data partitioning solutions are identified using the cluster ensembl e models . The ensemble information matrix presents only cluster data point relations. Ensembles based clustering techniques produces final data partition based on incomplete information. Link - based approach improves the conventional matrix by discovering unknown entries through cluster similarity in an ensemble. Link - based algorithm is used for the underlying similarity assessment. Pairwise similarity and binary cluster association matrices summarize the underlying ensemble information. A weighted bipartite graph is formulated from the refined matrix. The graph partitioning technique is applied on the weighted bipartite graph. The Particle Swarm Optimization (PSO) clustering algorithm is a optimization based clustering scheme. It is integrated with the clu ster ensemble model. Binary , categorical and continuous data clustering is supported in the system. The attribute connectivity analysis is optimized for all attributes. Refined cluster - association matrix (RM) is updated with all attribute relationships

    A maximal clique based multiobjective evolutionary algorithm for overlapping community detection

    Get PDF
    Detecting community structure has become one im-portant technique for studying complex networks. Although many community detection algorithms have been proposed, most of them focus on separated communities, where each node can be-long to only one community. However, in many real-world net-works, communities are often overlapped with each other. De-veloping overlapping community detection algorithms thus be-comes necessary. Along this avenue, this paper proposes a maxi-mal clique based multiobjective evolutionary algorithm for over-lapping community detection. In this algorithm, a new represen-tation scheme based on the introduced maximal-clique graph is presented. Since the maximal-clique graph is defined by using a set of maximal cliques of original graph as nodes and two maximal cliques are allowed to share the same nodes of the original graph, overlap is an intrinsic property of the maximal-clique graph. Attributing to this property, the new representation scheme al-lows multiobjective evolutionary algorithms to handle the over-lapping community detection problem in a way similar to that of the separated community detection, such that the optimization problems are simplified. As a result, the proposed algorithm could detect overlapping community structure with higher partition accuracy and lower computational cost when compared with the existing ones. The experiments on both synthetic and real-world networks validate the effectiveness and efficiency of the proposed algorithm

    Adaptive Partitioning for Large-Scale Dynamic Graphs

    Get PDF
    Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I
    • …
    corecore