226 research outputs found

    Constraint-based Subspace Clustering

    No full text
    International audienceIn high dimensional data, the general performance of traditional clustering algorithms decreases. This is partly because the similarity criterion used by these algorithms becomes inadequate in high dimensional space. Another reason is that some dimensions are likely to be irrelevant or contain noisy data, thus hiding a possible clustering. To overcome these problems, subspace clustering techniques, which can automatically find clusters in relevant subsets of dimensions, have been developed. However, due to the huge number of subspaces to consider, these techniques often lack efficiency. In this paper we propose to extend the framework of bottom up subspace clustering algorithms by integrating background knowledge and, in particular, instance-level constraints to speed up the enumeration of subspaces. We show how this new framework can be applied to both density and distance based bottom-up subspace clustering techniques. Our experiments on real datasets show that instance-level constraints cannot only increase the efficiency of the clustering process but also the accuracy of the resultant clustering

    Graph space: using both geometric and probabilistic structure to evaluate statistical graph models

    Full text link
    Statistical graph models aim at modeling graphs as random realization among a set of possible graphs. One issue is to evaluate whether or not a graph is likely to have been generated by one particular model. In this paper we introduce the edit distance expected value (EDEV) and compare it with other methods such as entropy and distance to the barycenter. We show that contrary to them, EDEV is able to distinguish between graphs that have a typical structure with respect to a model, and those that do not. Finally we introduce a statistical hypothesis testing methodology based on this distance to evaluate the relevance of a candidate model with respect to an observed graph

    Minimum entropy stochastic block models neglect edge distribution heterogeneity

    Full text link
    The statistical inference of stochastic block models as emerged as a mathematicaly principled method for identifying communities inside networks. Its objective is to find the node partition and the block-to-block adjacency matrix of maximum likelihood i.e. the one which has most probably generated the observed network. In practice, in the so-called microcanonical ensemble, it is frequently assumed that when comparing two models which have the same number and sizes of communities, the best one is the one of minimum entropy i.e. the one which can generate the less different networks. In this paper, we show that there are situations in which the minimum entropy model does not identify the most significant communities in terms of edge distribution, even though it generates the observed graph with a higher probability

    Nonnegative matrix factorization to find features in temporal networks

    Get PDF
    International audienceTemporal networks describe a large variety of systems having a temporal evolution. Characterization and visualization of their evolution are often an issue especially when the amount of data becomes huge. We propose here an approach based on the duality between graphs and signals. Temporal networks are represented at each time instant by a collection of signals, whose spectral analysis reveals connection between frequency features and structure of the network. We use nonnegative matrix factorization (NMF) to find these frequency features and track them along time. Transforming back these features into subgraphs reveals the underlying structures which form a decomposition of the temporal network

    Tracking of a dynamic graph using a signal theory approach : application to the study of a bike sharing system

    Get PDF
    International audienceDynamic graphs are useful objects to describe a network which evolves over time. We propose a signal theory approach to analyze them which consists of transforming the graph at each time step into a collection of signals and analyze these signals using spectral decomposition. An inverse transformation is also proposed and makes it possible to reduce the dimension of the graph and select the most significant edges. The method is applied on a real dynamic graph based on data about the bike sharing system VĂ©lo'v in Lyon. The analysis of signals representing the graph highlights the weekly cycle of rentals and the inverse transformation enables us to obtain sparser graphs

    Extraction of Temporal Network Structures from Graph-based Signals

    Get PDF
    International audienceA new framework to track the structure of temporal networks with a signal processing approach is introduced. The method is based on the duality between static networks and signals, obtained using a multidimensional scaling technique, that makes possible the study of the network structure from frequency patterns of the corresponding signals. In this paper, we propose an approach to identify structures in temporal networks by extracting the most significant frequency patterns and their activation coefficients over time, using nonnegative matrix factorization of the temporal spectra. The framework, inspired by audio decomposition, allows transforming back these frequency patterns into networks, to highlight the evolution of the underlying structure of the network over time. The effectiveness of the method is first evidenced on a synthetic example, prior being used to study a temporal network of face-to-face contacts. The extracted sub-networks highlight significant structures decomposed on time intervals that validates the relevance of the approach on real-world data
    • …
    corecore