Search CORE

49 research outputs found

Modèle des blocs latents avec une classe de bruit

Author: Brault Vincent
Laclau Charlotte
Publication venue: HAL CCSD
Publication date: 28/05/2018
Field of study

International audienceCo-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both modes of a dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this talk, we propose to tackle this issue by proposing a novel co-clustering model, based on the latent block model, and which assumes the existence of a noise cluster, that contains all irrelevant features. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world.La classification croisée est connue pour être une approche très efficace en apprentissage non supervisé en raison de sa capacité à partitionner simultanément les lignes et colonnes d'une matrice de données. Cependant, dans un contexte de grande dimension, les méthodes de classification croisée peuvent être perturbées en raison de la présence de colonnes bruitées et/ou non discriminantes. Dans cet exposé, nous abordons ce problème en proposant un nouveau modèle de classification croisée, à partir du modèle des blocs latents, qui modélise l'existence d'une classe de bruit, à laquelle appartient l'ensemble de ces variables non pertinentes pour le partitionnement des données. Les résultats obtenus sur des données synthétiques montrent l'efficacité de notre modèle dans le contexte des données bruitées en grande dimension. Enfin, nous soulignons l'intérêt de cette approche sur deux jeux de données réelles initialement proposés pour étudier les diversités génétiques à travers le monde

INRIA a CCSD electronic archive server

North Korea and the Politics of Visual Representation

Author: Aletta J Norval
Alex Danchev
Anca Pusca
B Myers
Barbara Demick
Chang W Lee
Charlotte Epstein
Christopher Morris
David Campbell
David Campbell
David Campbell
David Campbell
David Campbell
David Campbell
David Chandler
David Howard
David Howarth
David Shim
Debbie Lisle
Dirk Nabers
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Evan Ramstad
Gillian Rose
Guardian
Hyungwon Kang
Jacques Derrida
Jacques Lacan
Jae-Jung Suh
Joao Biehl
John Berger
Jonathan Benthall
Judith Butler
Judith Butler
Jung-Min Kang
Karen Fragala
Korea Herald
Korea Herald
Lars Bech
Mark Wenman
Martin Jay
Michael J Shapiro
Neil Postman
Nicholas Righetti
Observer
Ole Waever
Peter Hamilton
Petra Kolonko
Ralph Hassig
Roland Barthes
Roland Bleiker
Roland Bleiker
Roland Bleiker
Rudolphe Gasch�
Sheila Mcnulty
Simon Critchley
Stuart Hall
Susan D Moeller
Susan Sontag
Thomas Van Houtryve
Thomas Van Houtryve
Us Dod
Won-Sup Yoon
Yonhap
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Crossref

How Signifying Practices Constitute Food (In)Security: The Case of the Democratic People’s Republic of Korea

Author: Anca Pusca
Andrew / Lankov
Andrew Lankov
Andrew S Natsios
Bronwen / Dalton
Charlotte Epstein
Choel Choi
Daniel A Pinkston
Daniel Schwekendiek
Daniel Schwekendiek
David Campbell
David Campbell
David Campbell
David Morton
David Shim
Debbie Lisle
Dirk Reber
Edward P Reed
Edward P Reed
Edward W Said
Ernesto / Laclau
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Ernesto Laclau
Hazel Smith
Hazel Smith
Hazel Smith
Hazel Smith
Hazel Smith
Hazel Smith
Jenny Edkins
John Feffer
Jon Halliday
Judith Butler
Lee
Lola Nathanail
Marcus Noland
Marcus Noland
Mark / Nord
Michael J Shapiro
Michael Schloms
Michael Schloms
Michel Foucault
Michel Foucault
Michel Foucault
Michel Foucault
Michel Foucault
Naeem Inayatullah
Neil Postman
Nicholas Eberstadt
Nicholas Eberstadt
R R Krishnan
Relief Rrn
Robert Scalapino
Roland / Bleiker
Roland Bleiker
Roland Bleiker
Roxanne Doty
Samuel Kim
Scott Snyder
Scott Snyder
Stephan / Haggard
Stephan / Haggard
Stephan / Haggard
Stephan / Haggard
Stephan / Haggard
Stuart Hall
Suchan / Chae
Sue Lautze
Susan D Moeller
Tae-Jin Kwon
Tomas Van Houtryve
Vaughan-Williams
Woon-Keun / Kim
Young-Hoon Lee
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Crossref

Algorithmes de block-clustering dur et flou pour les données en grande dimension

Author: Laclau Charlotte
Publication venue
Publication date: 14/04/2016
Field of study

Notre capacité grandissante à collecter et stocker des données a fait de l'apprentissage non supervisé un outil indispensable qui permet la découverte de structures et de modèles sous-jacents aux données, sans avoir à \étiqueter les individus manuellement. Parmi les différentes approches proposées pour aborder ce type de problème, le clustering est très certainement le plus répandu. Le clustering suppose que chaque groupe, également appelé cluster, est distribué autour d'un centre défini en fonction des valeurs qu'il prend pour l'ensemble des variables. Cependant, dans certaines applications du monde réel, et notamment dans le cas de données de dimension importante, cette hypothèse peut être invalidée. Aussi, les algorithmes de co-clustering ont-ils été proposés: ils décrivent les groupes d'individus par un ou plusieurs sous-ensembles de variables au regard de leur pertinence. La structure des données finalement obtenue est composée de blocs communément appelés co-clusters. Dans les deux premiers chapitres de cette thèse, nous présentons deux approches de co-clustering permettant de différencier les variables pertinentes du bruit en fonction de leur capacité \`a révéler la structure latente des données, dans un cadre probabiliste d'une part et basée sur la notion de métrique, d'autre part. L'approche probabiliste utilise le principe des modèles de mélanges, et suppose que les variables non pertinentes sont distribuées selon une loi de probabilité dont les paramètres sont indépendants de la partition des données en cluster. L'approche métrique est fondée sur l'utilisation d'une distance adaptative permettant d'affecter à chaque variable un poids définissant sa contribution au co-clustering. D'un point de vue théorique, nous démontrons la convergence des algorithmes proposés en nous appuyant sur le théorème de convergence de Zangwill. Dans les deux chapitres suivants, nous considérons un cas particulier de structure en co-clustering, qui suppose que chaque sous-ensemble d'individus et décrit par un unique sous-ensemble de variables. La réorganisation de la matrice originale selon les partitions obtenues sous cette hypothèse révèle alors une structure de blocks homogènes diagonaux. Comme pour les deux contributions précédentes, nous nous plaçons dans le cadre probabiliste et métrique. L'idée principale des méthodes proposées est d'imposer deux types de contraintes : (1) nous fixons le même nombre de cluster pour les individus et les variables; (2) nous cherchons une structure de la matrice de données d'origine qui possède les valeurs maximales sur sa diagonale (par exemple pour le cas des données binaires, on cherche des blocs diagonaux majoritairement composés de valeurs 1, et de 0 à l’extérieur de la diagonale). Les approches proposées bénéficient des garanties de convergence issues des résultats des chapitres précédents. Enfin, pour chaque chapitre, nous dérivons des algorithmes permettant d'obtenir des partitions dures et floues. Nous évaluons nos contributions sur un large éventail de données simulées et liées a des applications réelles telles que le text mining, dont les données peuvent être binaires ou continues. Ces expérimentations nous permettent également de mettre en avant les avantages et les inconvénients des différentes approches proposées. Pour conclure, nous pensons que cette thèse couvre explicitement une grande majorité des scénarios possibles découlant du co-clustering flou et dur, et peut être vu comme une généralisation de certaines approches de biclustering populaires.With the increasing number of data available, unsupervised learning has become an important tool used to discover underlying patterns without the need to label instances manually. Among different approaches proposed to tackle this problem, clustering is arguably the most popular one. Clustering is usually based on the assumption that each group, also called cluster, is distributed around a center defined in terms of all features while in some real-world applications dealing with high-dimensional data, this assumption may be false. To this end, co-clustering algorithms were proposed to describe clusters by subsets of features that are the most relevant to them. The obtained latent structure of data is composed of blocks usually called co-clusters. In first two chapters, we describe two co-clustering methods that proceed by differentiating the relevance of features calculated with respect to their capability of revealing the latent structure of the data in both probabilistic and distance-based framework. The probabilistic approach uses the mixture model framework where the irrelevant features are assumed to have a different probability distribution that is independent of the co-clustering structure. On the other hand, the distance-based (also called metric-based) approach relied on the adaptive metric where each variable is assigned with its weight that defines its contribution in the resulting co-clustering. From the theoretical point of view, we show the global convergence of the proposed algorithms using Zangwill convergence theorem. In the last two chapters, we consider a special case of co-clustering where contrary to the original setting, each subset of instances is described by a unique subset of features resulting in a diagonal structure of the initial data matrix. Same as for the two first contributions, we consider both probabilistic and metric-based approaches. The main idea of the proposed contributions is to impose two different kinds of constraints: (1) we fix the number of row clusters to the number of column clusters; (2) we seek a structure of the original data matrix that has the maximum values on its diagonal (for instance for binary data, we look for diagonal blocks composed of ones with zeros outside the main diagonal). The proposed approaches enjoy the convergence guarantees derived from the results of the previous chapters. Finally, we present both hard and fuzzy versions of the proposed algorithms. We evaluate our contributions on a wide variety of synthetic and real-world benchmark binary and continuous data sets related to text mining applications and analyze advantages and inconvenients of each approach. To conclude, we believe that this thesis covers explicitly a vast majority of possible scenarios arising in hard and fuzzy co-clustering and can be seen as a generalization of some popular biclustering approaches

Theses.fr

Modèle des blocs latents avec une classe de bruit

Author: Brault Vincent
Laclau Charlotte
Publication venue: HAL CCSD
Publication date: 28/05/2018
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Noise-free Latent Block Model for High Dimensional Data

Author: Brault Vincent
Laclau Charlotte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

International audienceCo-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both the observations and the variables of a given dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this paper, we tackle this issue by proposing a novel co-clustering model which assumes the existence of a noise cluster, that contains all irrelevant features. A variational expectation-maximization (VEM)-based algorithm is derived for this task, where the automatic variable selection as well as the joint clustering of objects and variables are achieved via a Bayesian framework. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world

HAL-UJM

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Fast Simultaneous Clustering and Feature Selection for Binary Data

Author: Laclau Charlotte
Nadif Mohamed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceThis paper addresses the problem of clustering binary data with feature selection within the context of maximum likelihood (ML) and classification maximum likelihood (CML) approaches. In order to efficiently perform the clustering with feature selection, we propose the use of an appropriate Bernoulli model. We derive two algorithms: Expectation-Maximization (EM) and Classification EM (CEM) with feature selection. Without requiring a knowledge of the number of clusters, both algorithms optimize two approximations of the minimum message length (MML) criterion. To exploit the advantages of EM for clustering and of CEM for fast convergence, we combine the two algorithms. With Monte Carlo simulations and by varying parameters of the model, we rigorously validate the approach. We also illustrate our contribution using real datasets commonly used in document clustering

Crossref

HAL Descartes

Hal-Diderot

On Fair Cost Sharing Games in Machine Learning

Author: Laclau Charlotte
Redko Ievgen
Publication venue: HAL CCSD
Publication date: 27/01/2019
Field of study

International audienceMachine learning and game theory are known to exhibit a very strong link as they mutually provide each other with solutions and models allowing to study and analyze the optimal behaviour of a set of agents. In this paper, we take a closer look at a special class of games, known as fair cost sharing games, from a machine learning perspective. We show that this particular kind of games, where agents can choose between selfish behaviour and cooperation with shared costs, has a natural link to several machine learning scenarios including collaborative learning with homogeneous and heterogeneous sources of data. We further demonstrate how the game-theoretical results bounding the ratio between the best Nash equilibrium (or its approximate counterpart) and the optimal solution of a given game can be used to provide the upper bound of the gain achievable by the collaborative learning expressed as the expected risk and the sample complexity for homogeneous and heterogeneous cases, respectively. We believe that the established link can spur many possible future implications for other learning scenarios as well, with privacy-aware learning being among the most noticeable examples

HAL-UJM

Deep Neural Networks Are Congestion Games: From Loss Landscape to Wardrop Equilibrium and Beyond

Author: Laclau Charlotte
Redko Ievgen
Vesseron Nina
Publication venue: HAL CCSD
Publication date: 01/01/2021
Field of study

International audienceThe theoretical analysis of deep neural networks (DNN) is arguably among the most challenging research directions in machine learning (ML) right now, as it requires from scientists to lay novel statistical learning foundations to explain their behaviour in practice. While some success has been achieved recently in this endeavour, the question on whether DNNs can be analyzed using the tools from other scientific fields outside the ML community has not received the attention it may well have deserved. In this paper, we explore the interplay between DNNs and game theory (GT), and show how one can benefit from the classic readily available results from the latter when analyzing the former. In particular, we consider the widely studied class of congestion games, and illustrate their intrinsic relatedness to both linear and non-linear DNNs and to the properties of their loss surface. Beyond retrieving the state-of-the-art results from the literature, we argue that our work provides a very promising novel tool for analyzing the DNNs and support this claim by proposing concrete open problems that can advance significantly our understanding of DNNs when solved

HAL-ENS-LYON

HAL-UJM