226 research outputs found
Constraint-based Subspace Clustering
International audienceIn high dimensional data, the general performance of traditional clustering algorithms decreases. This is partly because the similarity criterion used by these algorithms becomes inadequate in high dimensional space. Another reason is that some dimensions are likely to be irrelevant or contain noisy data, thus hiding a possible clustering. To overcome these problems, subspace clustering techniques, which can automatically find clusters in relevant subsets of dimensions, have been developed. However, due to the huge number of subspaces to consider, these techniques often lack efficiency. In this paper we propose to extend the framework of bottom up subspace clustering algorithms by integrating background knowledge and, in particular, instance-level constraints to speed up the enumeration of subspaces. We show how this new framework can be applied to both density and distance based bottom-up subspace clustering techniques. Our experiments on real datasets show that instance-level constraints cannot only increase the efficiency of the clustering process but also the accuracy of the resultant clustering
Graph space: using both geometric and probabilistic structure to evaluate statistical graph models
Statistical graph models aim at modeling graphs as random realization among a
set of possible graphs. One issue is to evaluate whether or not a graph is
likely to have been generated by one particular model. In this paper we
introduce the edit distance expected value (EDEV) and compare it with other
methods such as entropy and distance to the barycenter. We show that contrary
to them, EDEV is able to distinguish between graphs that have a typical
structure with respect to a model, and those that do not. Finally we introduce
a statistical hypothesis testing methodology based on this distance to evaluate
the relevance of a candidate model with respect to an observed graph
Minimum entropy stochastic block models neglect edge distribution heterogeneity
The statistical inference of stochastic block models as emerged as a
mathematicaly principled method for identifying communities inside networks.
Its objective is to find the node partition and the block-to-block adjacency
matrix of maximum likelihood i.e. the one which has most probably generated the
observed network. In practice, in the so-called microcanonical ensemble, it is
frequently assumed that when comparing two models which have the same number
and sizes of communities, the best one is the one of minimum entropy i.e. the
one which can generate the less different networks. In this paper, we show that
there are situations in which the minimum entropy model does not identify the
most significant communities in terms of edge distribution, even though it
generates the observed graph with a higher probability
Nonnegative matrix factorization to find features in temporal networks
International audienceTemporal networks describe a large variety of systems having a temporal evolution. Characterization and visualization of their evolution are often an issue especially when the amount of data becomes huge. We propose here an approach based on the duality between graphs and signals. Temporal networks are represented at each time instant by a collection of signals, whose spectral analysis reveals connection between frequency features and structure of the network. We use nonnegative matrix factorization (NMF) to find these frequency features and track them along time. Transforming back these features into subgraphs reveals the underlying structures which form a decomposition of the temporal network
Tracking of a dynamic graph using a signal theory approach : application to the study of a bike sharing system
International audienceDynamic graphs are useful objects to describe a network which evolves over time. We propose a signal theory approach to analyze them which consists of transforming the graph at each time step into a collection of signals and analyze these signals using spectral decomposition. An inverse transformation is also proposed and makes it possible to reduce the dimension of the graph and select the most significant edges. The method is applied on a real dynamic graph based on data about the bike sharing system VĂ©lo'v in Lyon. The analysis of signals representing the graph highlights the weekly cycle of rentals and the inverse transformation enables us to obtain sparser graphs
Extraction of Temporal Network Structures from Graph-based Signals
International audienceA new framework to track the structure of temporal networks with a signal processing approach is introduced. The method is based on the duality between static networks and signals, obtained using a multidimensional scaling technique, that makes possible the study of the network structure from frequency patterns of the corresponding signals. In this paper, we propose an approach to identify structures in temporal networks by extracting the most significant frequency patterns and their activation coefficients over time, using nonnegative matrix factorization of the temporal spectra. The framework, inspired by audio decomposition, allows transforming back these frequency patterns into networks, to highlight the evolution of the underlying structure of the network over time. The effectiveness of the method is first evidenced on a synthetic example, prior being used to study a temporal network of face-to-face contacts. The extracted sub-networks highlight significant structures decomposed on time intervals that validates the relevance of the approach on real-world data
- …