3 research outputs found

    A mathematical programming approach to overlapping community detection

    Get PDF
    We propose a new optimization model to detect overlapping communities in networks. The model elaborates suggestions contained in Zhang et al. (2007), in which overlapping communities were identified through the use of a fuzzy membership function, calculated as the outcome of a mathematical programming problem. In our approach, we retain the idea of using both mathematical programming and fuzzy membership to detect overlapping communities, but we replace the fuzzy objective function proposed there with another one, based on the Newman and Girvan's definition of modularity. Next, we formulate a new mixed-integer linear programming model to calculate optimal overlapping communities. After some computational tests, we provide some evidence that our new proposal can fix some biases of the previous model, that is, its tendency of calculating communities composed of almost all nodes. Conversely, our new model can reveal other structural properties, such as nodes or communities acting as bridges between communities. Finally, as mathematical programming can be used only for moderate size networks due to its computation time, we proposed two heuristic algorithms to solve the largest instances, that compare favourably to other methodologies. (c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

    Clustering sequence graphs

    Get PDF
    In application domains ranging from social networks to e-commerce, it is important to cluster users with respect to both their relationships (e.g., friendship or trust) and their actions (e.g., visited locations or rated products). Motivated by these applications, we introduce here the task of clustering the nodes of a sequence graph, i.e., a graph whose nodes are labeled with strings (e.g., sequences of users’ visited locations or rated products). Both string clustering algorithms and graph clustering algorithms are inappropriate to deal with this task, as they do not consider the structure of strings and graph simultaneously. Moreover, attributed graph clustering algorithms generally construct poor solutions because they need to represent a string as a vector of attributes, which inevitably loses information and may harm clustering quality. We thus introduce the problem of clustering a sequence graph. We first propose two pairwise distance measures for sequence graphs, one based on edit distance and shortest path distance and another one based on SimRank. We then formalize the problem under each measure, showing also that it is NP-hard. In addition, we design a polynomial-time 2-approximation algorithm, as well as a heuristic for the problem. Experiments using real datasets and a case study demonstrate the effectiveness and efficiency of our methods
    corecore