Search CORE

291 research outputs found

Interpreting communities based on the evolution of a dynamic attributed network

Author: Boulicaut Jean-François
Labatut Vincent
Orman Günce,
Plantevit Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2015
Field of study

International audienceMany methods have been proposed to detect communities , not only in plain, but also in attributed, directed or even dynamic complex networks. From the modeling point of view, to be of some utility, the community structure must be characterized relatively to the properties of the studied system. However, most of the existing works focus on the detection of communities, and only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either by the type of data they handle, or by the nature of the results they output. In this work, we see the interpretation of communities as a problem independent from the detection process, consisting in identifying the most characteristic features of communities. We give a formal definition of this problem and propose a method to solve it. To this aim, we first define a sequence-based representation of networks, combining temporal information, community structure, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We study the performance of our method on artificially generated dynamic attributed networks. We also empirically validate our framework on real-world systems: a DBLP network of scientific collaborations, and a LastFM network of social and musical interactions

What did I do Wrong in my MOBA Game?: Mining Patterns Discriminating Deviant Behaviours

Author: Boulicaut Jean-François
Cavadenti Olivier
Codocedo Victor
Kaytoue Mehdi
Publication venue: HAL CCSD
Publication date: 17/10/2016
Field of study

International audienceThe success of electronic sports (eSports), where professional gamers participate in competitive leagues and tournaments , brings new challenges for the video game industry. Other than fun, games must be difficult and challenging for eSports professionals but still easy and enjoyable for amateurs. In this article, we consider Multi-player Online Battle Arena games (MOBA) and particularly, " Defense of the Ancients 2 " , commonly known simply as DOTA2. In this context, a challenge is to propose data analysis methods and metrics that help players to improve their skills. We design a data mining-based method that discovers strategic patterns from historical behavioral traces: Given a model encoding an expected way of playing (the norm), we are interested in patterns deviating from the norm that may explain a game outcome from which player can learn more efficient ways of playing. The method is formally introduced and shown to be adaptable to different scenarios. Finally, we provide an experimental evaluation over a dataset of 10, 000 behavioral game traces

Crossref

SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences

Author: Boulicaut Jean-François
Kaytoue Mehdi
Mathonat Romain
Nurbakova Diana
Publication venue: HAL CCSD
Publication date: 05/10/2019
Field of study

International audienceIt is extremely useful to exploit labeled datasets not only to learn models but also to improve our understanding of a domain and its available targeted classes. The so-called subgroup discovery task has been considered for a long time. It concerns the discovery of patterns or descriptions, the set of supporting objects of which have interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for transactional data, discovering subgroups within labeled sequential data and thus searching for descriptions as sequential patterns has been much less studied. In that context, exhaustive exploration strategies can not be used for real-life applications and we have to look for heuristic approaches. We propose the algorithm SeqScout to discover interesting subgroups (w.r.t. a chosen quality measure) from labeled sequences of itemsets. This is a new sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. It is an anytime algorithm that, for a given budget, finds a collection of local optima in the search space of descriptions and thus subgroups. It requires a light configuration and it is independent from the quality measure used for pattern scoring. Furthermore, it is fairly simple to implement. We provide qualitative and quantitative experiments on several datasets to illustrate its added-value

Crossref

HAL

Hal-Diderot

A Method for Characterizing Communities in Dynamic Attributed Complex Networks

Author: Boulicaut Jean-François
Labatut Vincent
Orman Günce Keziban
Plantevit Marc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/06/2014
Field of study

Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. In its simplest form, a community structure takes the form of a partition of the node set. From the modeling point of view, to be of some utility, this partition must then be characterized relatively to the properties of the studied system. However, if most of the existing works focus on defining methods for the detection of communities, only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either in the type of data they handle, or by the nature of the results they output. In this work, we propose a method to efficiently support such a characterization task. We first define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We also show how to detect unusual behavior in a community, and highlight outliers. Finally, as an illustration, we apply our method to a network of scientific collaborations.Comment: IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), P\'ekin : China (2014

arXiv.org e-Print Archive

Crossref

HAL

Hal-Diderot

From local pattern mining to relevant bi-cluster characterization

Author: Jean-François Boulicaut
Ruggero G Pensa
Publication venue
Publication date: 23/04/2020
Field of study

Abstract. Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi-partition is available. We introduce a generic cluster characterization technique which is based on collections of bi-sets (i.e., sets of objects associated to sets of properties) which satisfy some userdefined constraints, and a measure of the accuracy of a given bi-set as a bi-cluster characterization pattern. The method is illustrated on both formal concepts (i.e., "maximal rectangles of true values") and the new type of δ-bi-sets (i.e., "rectangles of true values with a bounded number of exceptions per column"). The added-value is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data

CiteSeerX

Extraction sous Contraintes d'Ensembles de Cliques Homogènes

Author: Boulicaut Jean-François
Gandrillon Olivier
Mougel Pierre-Nicolas
Plantevit Marc
Rigotti Christophe
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

Document sur site LIRIS : http://liris.cnrs.fr/Documents/Liris-4915.pdfNational audienceNous proposons une méthode de fouille de données sur des graphes ayant un ensemble d'étiquettes associé à chaque sommet. Une application est, par exemple, d'analyser un réseau social de chercheurs co-auteurs lorsque des étiquettes précisent les conférences dans lesquelles ils publient.Nous définissons l'extraction sous contraintes d'ensembles de cliques tel que chaque sommet des cliques impliquées partage suffisamment d'étiquettes. Nous proposons une méthode pour calculer tous les Ensembles Maximaux de Cliques dits Homogènes qui satisfont une conjonction de contraintes fixée par l'analyste et concernant le nombre de cliques séparées, la taille des cliques ainsi que le nombre d'étiquettes partagées. Les expérimentations montrent que l'approche fonctionne sur de grands graphes construits à partir de données réelles et permet la mise en évidence de structures intéressantes

HAL-UJM

INRIA a CCSD electronic archive server

Hal-Diderot

Découverte de sous-groupes avec les arbres de recherche de Monte Carlo

Author: Bosc Guillaume
Boulicaut Jean-François
Raïssi Chedy
Kaytoue Mehdi
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

National audienceDécouvrir des règles qui distinguent clairement une classe d'une autre reste un problème difficile. De tels motifs permettent de suggérer des hypothèses pouvant expliquer une classe. La découverte de sous-groupes (Subgroup Discovery , SD), un cadre qui définit formellement cette tâche d'extraction de motifs, est toujours confrontée à deux problèmes majeurs: (i) définir des mesures de qualité appropriées qui caractérisent la singularité d'un motif et (ii) choisir une heuristique d'exploration de l'espace de recherche correcte lorsqu'une énuméra-tion complète est irréalisable. À ce jour, les algorithmes de SD les plus efficaces sont basés sur une recherche en faisceau (Beam Search, BS). La collection de motifs extraits manque cependant de diversité en raison de la nature gloutonne de l'exploration. Nous proposons ici d'utiliser une technique d'exploration récente, la recherche arborescente de Monte Carlo (Monte Carlo Tree Search, MCTS). Le compromis entre l'exploitation et l'exploration ainsi que la puissance de la recherche aléatoire permettent d'obtenir une solution disponible à tout moment et de surpasser généralement les approches de type BS. Notre étude empirique, avec plusieurs mesures de qualité, sur divers jeux de données de référence et du monde réel démontre la qualité de notre approche

INRIA a CCSD electronic archive server

Archivio istituzionale della ricerca - Università di Palermo

Découverte de sous-groupes avec les arbres de recherche de Monte Carlo

Author: Bosc Guillaume
Boulicaut Jean-François
Kaytoue Mehdi
Raïssi Chedy
Publication venue: HAL CCSD
Publication date: 24/01/2017
Field of study

INRIA a CCSD electronic archive server

Hal-Diderot

Anytime Discovery of a Diverse Set of Patterns with Monte Carlo Tree Search

Author: Bosc Guillaume
Boulicaut Jean-François
Kaytoue Mehdi
Raïssi Chedy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

International audienceThe discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting patterns from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing approaches make use of beam-search, sampling, and genetic algorithms for discovering a pattern set that is non-redundant and of high quality w.r.t. a pattern quality measure. We argue that such approaches produce pattern sets that lack of diversity: Only few patterns of high quality, and different enough, are discovered. Our main contribution is then to formally define pattern mining as a game and to solve it with Monte Carlo tree search (MCTS). It can be seen as an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property. We show through a comprehensive set of experiments how MCTS enables the anytime discovery of a diverse pattern set of high quality. It out-performs other approaches when dealing with a large pattern search space and for different quality measures. Thanks to its genericity, our MCTS approach can be used for SD but also for many other pattern mining tasks

INRIA a CCSD electronic archive server