Search CORE

87 research outputs found

Un modèle de mélange pour la classification croisée d'un tableau de données continue

Author: Govaert Gérard
Nadif Mohamed
Publication venue: HAL CCSD
Publication date: 26/05/2009
Field of study

National audienceContrairement aux méthodes de classification automatique habituelles, les méthodes de classification croisée traitent l'ensemble des lignes et l'ensemble des colonnes d'un tableau de données simultanément en cherchant à obtenir des blocs homogènes. Dans cet article, nous abordons la classification croisée lorsque le tableau de données porte sur un ensemble d'individus décrits par des variables quantitatives et, pour tenir compte de cet objectif, nous proposons un modèle de mélange adapté à la classification croisée conduisant à des critères originaux permettant de prendre en compte des situations plus complexes que les critères habituellement utilisés dans ce contexte. Les paramètres sont alors estimés par un algorithme EM généralisé (GEM) maximisant la vraisemblance des données observées. Nous proposons en outre une nouvelle expression du critère bayésien de l'information, appelée BIC_B, adaptée à notre situation pour évaluer le nombre de blocs. Des expériences numériques portant sur des données synthétiques permettent d'évaluer les performances de GEM et de BIC_B et de montrer l'intérêt de cette approche

HAL Descartes

Graph Cuts with Arbitrary Size Constraints Through Optimal Transport

Author: Fettal Chakib
Labiod Lazhar
Nadif Mohamed
Publication venue
Publication date: 07/02/2024
Field of study

A common way of partitioning graphs is through minimum cuts. One drawback of classical minimum cut methods is that they tend to produce small groups, which is why more balanced variants such as normalized and ratio cuts have seen more success. However, we believe that with these variants, the balance constraints can be too restrictive for some applications like for clustering of imbalanced datasets, while not being restrictive enough for when searching for perfectly balanced partitions. Here, we propose a new graph cut algorithm for partitioning graphs under arbitrary size constraints. We formulate the graph cut problem as a regularized Gromov-Wasserstein problem. We then propose to solve it using accelerated proximal GD algorithm which has global convergence guarantees, results in sparse solutions and only incurs an additional ratio of

\mathcal{O}(\log(n))

compared to the classical spectral clustering algorithm but was seen to be more efficient

arXiv.org e-Print Archive

More Discriminative Sentence Embeddings via Semantic Graph Smoothing

Author: Fettal Chakib
Labiod Lazhar
Nadif Mohamed
Publication venue
Publication date: 20/02/2024
Field of study

This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion. Leveraging semantic graph smoothing, we enhance sentence embeddings obtained from pretrained models to improve results for the text clustering and classification tasks. Our method, validated on eight benchmarks, demonstrates consistent improvements, showcasing the potential of semantic graph smoothing in improving sentence embeddings for the supervised and unsupervised document categorization tasks.Comment: Accepted in EACL 202

arXiv.org e-Print Archive

Exploring Topic Variants Through an Hybrid Biclustering Approach

Author: Ghoniem Mohammad
Médoc Nicolas
Nadif Mohamed
Publication venue: HAL CCSD
Publication date: 19/09/2019
Field of study

In large text corpora, analytic journalists need to identify facts, verify them by locating corroborating documents and survey all related viewpoints. This requires them to make sense of document relationships at two levels of granularity: high-level topics and low-level topic variants. We propose a visual analytics software allowing analytic journalists to verify and refine hypotheses without having to read all documents. Our system relies on a hybrid biclustering approach. A new Topic Weighted Map visualization conveys all top-level topics reflecting their importance and their relative similarity. Then, coordinated multiple views allow to drill down into topic variants through an interactive term hierarchy visualization. Hence, the analyst can select, compare and filter out the subtle co-occurrences of terms shared by multiple documents to find interesting facts or stories. The usefulness of the tool is shown through a usage scenario and further assessed through a qualitative evaluation by an expert user.Dans des corpus textuels volumineux, les journalistes analytiques cherchent des documents et des récits qui corroborent des faits, en les examinant sous tous les angles. Nous présentons un outil de visualisation analytique leur permettant de vérifier, d’affiner et de générer des hypothèses sans avoir à lire la totalité des contenus. Notre système repose sur une approche hybride de biclustering. Les sujets de haut niveau sont présentés via une carte pondérée de sujets, reflétant à la fois leur importance et leur similarité relative. Pour chaque sujet, une vue hiérarchique et interactive dresse un aperçu de toutes ses variantes, de manière à identifier les documents traités sous un même angle ou partageant des faits communs. Des vues multiples et coordonnées permettent une analyse plus fine, en filtrant, sélectionnant et comparant les variantes de sujet, au regard des motifs de co-occurrence de termes les plus intéressants. L’utilité de l’outil est montrée par un scénario d’usage, puis évaluée qualitativement par un journaliste analytique

HAL Descartes

Hal-Diderot

Scalable Multi-view Clustering via Explicit Kernel Features Maps

Author: Fettal Chakib
Labiod Lazhar
Nadif Mohamed
Publication venue
Publication date: 07/02/2024
Field of study

A growing awareness of multi-view learning as an important component in data science and machine learning is a consequence of the increasing prevalence of multiple views in real-world applications, especially in the context of networks. In this paper we introduce a new scalability framework for multi-view subspace clustering. An efficient optimization strategy is proposed, leveraging kernel feature maps to reduce the computational burden while maintaining good clustering performance. The scalability of the algorithm means that it can be applied to large-scale datasets, including those with millions of data points, using a standard machine, in a few minutes. We conduct extensive experiments on real-world benchmark networks of various sizes in order to evaluate the performance of our algorithm against state-of-the-art multi-view subspace clustering methods and attributed-network multi-view approaches

arXiv.org e-Print Archive

A survey on recent advances in named entity recognition

Author: Keraghel Imed
Morbieu Stanislas
Nadif Mohamed
Publication venue
Publication date: 19/01/2024
Field of study

Named Entity Recognition seeks to extract substrings within a text that name real-world objects and to determine their type (for example, whether they refer to persons or organizations). In this survey, we first present an overview of recent popular approaches, but we also look at graph- and transformer- based methods including Large Language Models (LLMs) that have not had much coverage in other surveys. Second, we focus on methods designed for datasets with scarce annotations. Third, we evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics (as regards their domain, their size, and their number of classes). We thus provide a deep comparison of algorithms that are never considered together. Our experiments shed some light on how the characteristics of datasets affect the behavior of the methods that we compare.Comment: 30 page

arXiv.org e-Print Archive

Generalized topographic block model

Author: Govaert Gérard
Nadif Mohamed
Priam Rodolphe
Publication venue: 'Elsevier BV'
Publication date
Field of study

Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation–maximization algorithm with a Newton–Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model

Southampton (e-Prints Soton)

Block Mixture Model for the Biclustering of Microarray Data

Author: Ben Saber Haifa
Elloumi Mourad
Nadif Mohamed
Publication venue: HAL CCSD
Publication date: 29/08/2011
Field of study

This publication is a representation of what appears in the IEEE Digital Libraries.International audienceAn attractive way to make biclustering of genes and conditions is to adopt a Block Mixture Model (BMM). Approaches based on a BMM operate thanks to a Block Expectation Maximization (BEM) algorithm and/or a Block Classification Expectation Maximization (BCEM) one. The drawback of these approaches is their difficulty to choose a good strategy of initialization of the BEM and BCEM algorithms. This paper introduces existing biclustering approaches adopting a BMM and suggests a new fuzzy biclustering one. Our approach enables to choose a good strategy of initialization of the BEM and BCEM algorithms

HAL Descartes

La politique de confidentialité d’un site marchand en tant que moyen pour renforcer la confiance des consommateurs en ligne

Author: BOUSSETA Mohamed
NADIF Houria
Publication venue: Pr. Yassine HILMI
Publication date: 21/08/2020
Field of study

Qu’il soit considéré comme média ou comme lieu d’achat, internet n’en finit pas de poser des problèmes de confiance aux consommateurs en ligne vis-à-vis des transactions commerciale. C’est pour cela, gagner la confiance des cyberconsommateurs devient essentiel pour les entreprises spécialisées dans le secteur du commerce électronique. Afin d’essayer d’instaurer la confiance auprès des consommateurs vis-à-vis d’internet et des sites marchands en particulier, de nombreux outils on été mis en place tels que le recours à des labels de confiance, mais aussi à des politiques de confidentialités pour la protection des données personnelles et le respect de la vie privée. L’objectif de cet article est d’identifier, à partir d’une étude qualitative, le moyen par lequel la politique de confidentialité d’un site marchand peut avoir un impact sur la confiance du consommateur

Revue du contrôle, de la comptabilité et de l’audit