16 research outputs found

    Covariance and Correlation Kernels on a Graph in the Generalized Bag-of-Paths Formalism

    Get PDF
    This work derives closed-form expressions computing the expectation of co-presence and of number of co-occurrences of nodes on paths sampled from a network according to general path weights (a bag of paths). The underlying idea is that two nodes are considered as similar when they often appear together on (preferably short) paths of the network. The different expressions are obtained for both regular and hitting paths and serve as a basis for computing new covariance and correlation measures between nodes, which are valid positive semi-definite kernels on a graph. Experiments on semi-supervised classification problems show that the introduced similarity measures provide competitive results compared to other state-of-the-art distance and similarity measures between nodes

    Sparse Randomized Shortest Paths Routing with Tsallis Divergence Regularization

    Full text link
    This work elaborates on the important problem of (1) designing optimal randomized routing policies for reaching a target node t from a source note s on a weighted directed graph G and (2) defining distance measures between nodes interpolating between the least cost (based on optimal movements) and the commute-cost (based on a random walk on G), depending on a temperature parameter T. To this end, the randomized shortest path formalism (RSP, [2,99,124]) is rephrased in terms of Tsallis divergence regularization, instead of Kullback-Leibler divergence. The main consequence of this change is that the resulting routing policy (local transition probabilities) becomes sparser when T decreases, therefore inducing a sparse random walk on G converging to the least-cost directed acyclic graph when T tends to 0. Experimental comparisons on node clustering and semi-supervised classification tasks show that the derived dissimilarity measures based on expected routing costs provide state-of-the-art results. The sparse RSP is therefore a promising model of movements on a graph, balancing sparse exploitation and exploration in an optimal way

    Randomized Shortest Paths with Net Flows and Capacity Constraints

    Full text link
    This work extends the randomized shortest paths (RSP) model by investigating the net flow RSP and adding capacity constraints on edge flows. The standard RSP is a model of movement, or spread, through a network interpolating between a random-walk and a shortest-path behavior [30, 42, 49]. The framework assumes a unit flow injected into a source node and collected from a target node with flows minimizing the expected transportation cost, together with a relative entropy regularization term. In this context, the present work first develops the net flow RSP model considering that edge flows in opposite directions neutralize each other (as in electric networks), and proposes an algorithm for computing the expected routing costs between all pairs of nodes. This quantity is called the net flow RSP dissimilarity measure between nodes. Experimental comparisons on node clustering tasks indicate that the net flow RSP dissimilarity is competitive with other state-of-the-art dissimilarities. In the second part of the paper, it is shown how to introduce capacity constraints on edge flows, and a procedure is developed to solve this constrained problem by exploiting Lagrangian duality. These two extensions should improve significantly the scope of applications of the RSP framework

    Essays on network data analysis through the bag-of-paths framework

    No full text
    Since the rapid growth of the Internet and the advent of social networks in the 2000s, the amount of available network data is quickly increasing, leading to the development of new network analysis methods. Nowadays, these network analysis methods have spread to various fields, including, among others, marketing, supply chain, finance, and biology, as essential analysis and prediction tools. This thesis focuses on the development of one of these methods, called the bag-of-paths framework. This framework has the specificity to define a family of dissimilarity measures between nodes of the network that extrapolate between an optimal exploitation of the graph structure (optimal behavior - shortest path distance) and a random exploration of the graph (random behavior - commute time distance) via a parameter that controls the desired degree of randomness/exploration. Throughout this thesis, we propose several theoretical and practical extensions of the bag-of-paths framework. Regarding the theoretical contributions, we incorporate capacity constraints on edges, marginal constraints on input and output flows, and a Poisson distribution weighting and constraining path lengths, into the bag-of-paths framework. Furthermore, we expose the applicability of this framework through graph-based semi-supervised classification tasks and a real-life fraud detection case.(ECGE - Sciences économiques et de gestion) -- UCL, 202

    Community detection in networks by soft modularity maximization : A new approach and empirical comparisons

    No full text
    Community detection in networks is one of the major fundamentals of the science of networks. This is an emerging discipline and part of the computing sciences. It purports to study networks data and, especially, analyze the links and interconnections within these networks. Nevertheless, it did not attract significant interest until the rapid growth of the Internet in the early 2000's, as it became more and more popular and extended to diverse scientific areas such as physics, biology, ecology, marketing, etc... In general terms, a graph is a mathematical object composed of elements called "nodes" which can be connected two-by-two by an edge if there is any relation between them. As the science of networks spreads to more and more sectors, we can find networks in a growing number of contexts. Among these networks, one of the best-known is the World Wide Web, within which web pages are interconnected by hyperlinks. Another, more recent example of networks is Facebook, the well-known social media through which people connect with each other on the basis of friendships or any other characteristics which they are likely to share. The aim of this thesis is to examine a characteristic feature of any network: community structure, in particular the detection of these communities using clustering methods. Clustering groups nodes according to their similarities or to their difference in communities without knowing beforehand the class labels underlying the graph. Thus, the clustering algorithms generally allow to achieve a partition of the distribution of every nodes in the various communities. In this classic vision of clustering, every node is thus assigned to a single community. Nevertheless, this view was recently somewhat contradicted by the appearance of the concept of fuzzy communities, in which a node may be in more than one community at a time. In this concept, the communities may overlap and the structure of communities of the graph becomes more complex to analyze. That is why we introduce in this thesis two new clustering algorithms allowing us to find a fuzzy partition of communities in a network. These new algorithms are based on a measure of closeness called modularity and introduced by the physicist J. Newman which we modified to obtain a fuzzy version that allows us to meet new expectations in terms of detection of communities. The purpose of this thesis is to study the performances of our two new algorithms regarding detection of communities by comparing them with other methods of clustering which are already well-established in the science of networks. To direct our study, we posed two research questions: • Are the entropy based soft modularity and the deterministic annealing entropy based soft modularity algorithms competitive compared to the kernel k-means algorithms whenever we use the natural numbers of clusters ? • Are the entropy based soft modularity and the deterministic annealing entropy based soft modularity algorithms competitive compared to the kernel k-means algorithms and the Louvain method whenever the number of clusters has not been determined in advance ? To answer these questions, we are going to conduct two different experiments. In the first one, we shall compare our algorithms with four kernel k-means: the Sigmoid Commute Time, the Sigmoid Corrected Commute Time, the Log Forest and the Free Energy, by using the natural number of clusters for each dataset. In the second experiment, we shall once again compare our algorithms to four kernel k-means but also to the Louvain method, though in this case, we will not determine the number of clusters beforehand. We will thus have to define it empirically for each dataset and each algorithm, except for the Louvain method which, by itself, returns a certain number of clusters.Master [120] en Ingénieur de gestion, Université catholique de Louvain, 201

    A simple extension of the bagof- paths model weighting path lengths by a Poisson distribution

    No full text
    This work extends the bag-of-paths model by introducing a weighting of the length of the paths in the network, provided by a Poisson probability distribution. The main advantage of this approach is that it allows to tune the mean path length parameter which is most relevant for the application at hand. Various quantities of interest, such as the probability of drawing a path from the bag of paths, or the join probability of sampling any path connecting two nodes of interest, can easily be computed in closed form from this model. In this context, a new distance measure between nodes of a network, considering a weighting factor on the length of the paths, is defined. Experiments on semi-supervised classification tasks show that the introduced distance measure provides competitive results compared to other state-of-the-art methods. Moreover, a new interpretation of the logarithmic communicability similarity measure is proposed in terms of the new model

    A Simple Extension of the Bag-of-Paths Model Weighting Path Lengths by a Poisson Distribution

    No full text
    This work extends the bag-of-paths model by introducing a weighting of the length of the paths in the network, provided by a Poisson probability distribution. The main advantage of this approach is that it allows to tune the mean path length parameter which is most relevant for the application at hand. Various quantities of interest, such as the probability of drawing a path from the bag of paths, or the join probability of sampling any path connecting two nodes of interest, can easily be computed in closed form from this model. In this context, a new distance measure between nodes of a network, considering a weighting factor on the length of the paths, is defined. Experiments on semi-supervised classification tasks show that the introduced distance measure provides competitive results compared to other state-of-the-art methods. Moreover, a new interpretation of the logarithmic communicability similarity measure is proposed in terms of the new model

    Covariance and correlation measures on a graph in a generalized bag-of-paths formalism

    No full text
    This work derives closed-form expressions computing the expectation of co-presence and of number of co-occurrences of nodes on paths sampled from a network according to general path weights (a bag of paths). The underlying idea is that two nodes are considered as similar when they often appear together on (preferably short) paths of the network. The different expressions are obtained for both regular and hitting paths and serve as a basis for computing new covariance and correlation measures between nodes, which are valid positive semi-definite kernels on a graph. Experiments on semi-supervised classification problems show that the introduced similarity measures provide competitive results compared to other state-of-the-art distance and similarity measures between nodes

    Design of Biased Random Walks on a Graph with Application to Collaborative Recommendation

    No full text
    This work investigates a paths-based statistical physics formalism for the design of random walks on a graph in which the transition probabilities (the policy) are optimally biased in favor of some node features. More precisely, given a weighted directed graph GG and a nonnegative cost assigned to each edge, the biased random walk is defined as the policy minimizing the expected cost rate along the walks while maintaining a constant relative entropy rate. The model is formulated by assigning a Gibbs-Boltzmann distribution to the set of infinite walks and allows to recover some known results from the literature, derived from a different perspective. Examples of quantities of interest are the partition function of the system, the optimal transition probabilities, the cost rate, etc. In addition, the same formalism allows to introduce capacity constraints on the expected visit rates to the nodes and an algorithm for computing the optimal policy subject to capacity constraints is developed. Simulation results indicate that the proposed procedure can be effectively used in order to define a Markov chain driving the walk towards nodes having some specific properties, like seniority, education level or low node degree (hub-avoiding walk). An application relying on this last property is proposed as a tool for improving serendipity in collaborative recommendation, and is tested on the MovieLens data

    Randomized Shortest Paths with Net Flows and Capacity Constraints

    No full text
    This work extends the randomized shortest paths model (RSP) by investigating the net flow RSP and adding capacity constraints on edge flows. The standard RSP is a model of movement, or spread, through a network interpolating between a random walk and a shortest path behaviour. This framework assumes a unit flow injected into a source node and collected from a target node with flows minimizing the expected transportation cost together with a relative entropy regularization term. In this context, the present work first develops the net flow RSP model considering that edge flows in opposite directions neutralize each other (as in electrical networks) and proposes an algorithm for computing the expected routing costs between all pairs of nodes. This quantity is called the net flow RSP dissimilarity measure between nodes. Experimental comparisons on node clustering tasks show that the net flow RSP dissimilarity is competitive with other state-of-the-art techniques. In the second part of the paper, it is shown how to introduce capacity constraints on edge flows and a procedure solving this constrained problem by using Lagrangian duality is developed. These two extensions improve significantly the scope of applications of the RSP framework
    corecore