Search CORE

189 research outputs found

Link Prediction via Community Detection in Bipartite Multi-Layer Graphs

Author: Crémilleux Bruno
Koptelov Maksim
Soualmia Lina F.
Zimmermann Albrecht
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/03/2020
Field of study

International audienceThe growing number of multi-relational networks pose new challenges concerning the development of methods for solving classical graph problems in a multi-layer framework, such as link prediction. In this work, we combine an existing bipartite local models method with approaches for link prediction from communities to address the link prediction problem in multi-layer graphs. To this end, we extend existing community detection-based link prediction measures to the bipartite multi-layer network setting. We obtain a new generic framework for link prediction in bipartite multi-layer graphs, which can integrate any community detection approach, is capable of handling an arbitrary number of networks, rather inexpensive (depending on the community detection technique), and able to automatically tune its parameters. We test our framework using two of the most common community detection methods, the Louvain algorithm and spectral partitioning, which can be easily applied to bipartite multi-layer graphs. We evaluate our approach on benchmark data sets for solving a common drug-target interaction prediction task in computational drug design and demonstrate experimentally that our approach is competitive with the state-of-the-art

HAL - Normandie Université

Crossref

Discovering Knowledge using a Constraint-based Language

Author: Boizumault Patrice
Crémilleux Bruno
Khiari Mehdi
Loudni Samir
Métivier Jean-Philippe
Publication venue
Publication date: 15/05/2011
Field of study

Discovering pattern sets or global patterns is an attractive issue from the pattern mining community in order to provide useful information. By combining local patterns satisfying a joint meaning, this approach produces patterns of higher level and thus more useful for the data analyst than the usual local patterns, while reducing the number of patterns. In parallel, recent works investigating relationships between data mining and constraint programming (CP) show that the CP paradigm is a nice framework to model and mine such patterns in a declarative and generic way. We present a constraint-based language which enables us to define queries addressing patterns sets and global patterns. The usefulness of such a declarative approach is highlighted by several examples coming from the clustering based on associations. This language has been implemented in the CP framework.Comment: 12 page

arXiv.org e-Print Archive

HAL - Normandie Université

Sequence Classification Based on Delta-Free Sequential Pattern

Author: Charnois Thierry
Crémilleux Bruno
Holat Pierre
Plantevit Marc
Raïssi Chedy
Tomeh Nadi
Publication venue: HAL CCSD
Publication date: 14/12/2014
Field of study

International audienceSequential pattern mining is one of the most studied and challenging tasks in data mining. However, the extension of well-known methods from many other classical patterns to sequences is not a trivial task. In this paper we study the notion of δ-freeness for sequences. While this notion has extensively been discussed for itemsets, this work is the first to extend it to sequences. We define an efficient algorithm devoted to the extraction of δ-free sequential patterns. Furthermore, we show the advantage of the δ-free sequences and highlight their importance when building sequence classifiers, and we show how they can be used to address the feature selection problem in statistical classifiers, as well as to build symbolic classifiers which optimizes both accuracy and earliness of predictions

HAL - Normandie Université

INRIA a CCSD electronic archive server

HAL

HAL-Paris 13

Hal-Diderot

Fouille de motifs séquentiels pour la découverte de relations entre gènes et maladies rares

Author: Béchet Nicolas
Cellier Peggy
Charnois Thierry
Crémilleux Bruno
Publication venue: HAL CCSD
Publication date: 25/06/2012
Field of study

National audienceOrphanet est un organisme dont l'objectif est notamment de rassembler des collections d'articles traitant de maladies rares. Cependant, l'acquisition de nouvelles connaissances dans ce domaine est actuellement réalisée manuellement. Dès lors, obtenir de nouvelles informations relatives aux maladies rares est un processus chronophage. Permettre d'obtenir ces informations de manière automatique est donc un enjeu important. Dans ce contexte, nous proposons d'aborder la question de l'extraction de relations entre gènes et maladies rares en utilisant des approches de fouille de données, plus particulièrement de fouille de motifs séquentiels sous contraintes. Nos expérimentations montrent l'intérêt de notre approche pour l'extraction de relations entre gènes et maladies rares à partir de résumés d'articles de PubMe

HAL - Normandie Université

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Fouille de données séquentielles pour l'extraction d'information dans les textes

Author: Charnois Thierry
Crémilleux Bruno
Plantevit Marc
Rigotti Christophe
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/01/2009
Field of study

International audienceCet article montre l'intérêt d'utiliser les motifs issus des méthodes de fouille de données dans le domaine du TAL appliqué à la biologie médicale et génétique, et plus particulièrement dans les tâches d'extraction d'information. Nous proposons une approche pour apprendre les patrons linguistiques par une méthode de fouille de données fondée sur les motifs séquentiels et sur une fouille dite récursive des motifs eux-mêmes. Une originalité de notre approche est de s'affranchir de l'analyse syntaxique tout en permettant de produire des résultats symboliques, intelligibles pour l'utilisateur, a contrario des méthodes numériques qui restent difficilement interprétables. Elle ne nécessite pas de ressources linguistiques autres que le corpus d'apprentissage. Pour la reconnaissance d'entités biologiques nommées, nous proposons une méthode fondée sur un nouveau type de motifs intégrant une séquence et son contexte. This paper shows the benefit of using data mining methods for Biological Natural Language Processing. A method for discovering linguistic patterns based on a recursive sequential pattern mining is proposed. It does not require a sentence parsing nor other resource except a training data set. It produces understandable results and we show its interest in the extraction of relations between named entities. For the named entities recognition problem, we propose a method based on a new kind of patterns taking account the sequence and its context

HAL - Normandie Université

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts

Author: Cellier Peggy
Charnois Thierry
Crémilleux Bruno
Gandrillon Olivier
Klema Jiri
Manguin Jean-Luc
Plantevit Marc
Rigotti Christophe
Publication venue: BioMed Central
Publication date: 01/01/2015
Field of study

International audienceBackgroundDiscovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user.ResultsWe take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed.ConclusionsExperiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/ webcite. The software is available at https://bingo2.greyc.fr/?q=node/22 webcite

HAL - Normandie Université

HAL-CentraleSupelec

HAL-UJM

Crossref

INRIA a CCSD electronic archive server

Étude Expérimentale d'Extraction d'Information dans des Retranscriptions de Réunions

Author: Alizadeh Pegah
Cellier Peggy
Charnois Thierry
Crémilleux Bruno
Zimmermann Albrecht
Publication venue: HAL CCSD
Publication date: 14/05/2018
Field of study

National audienceAn Experimental Approach For Information Extraction in Multi-Party Dialogue Discourse In this paper, we address the task of information extraction for meeting transcripts. The meeting documents are not usually well-structured and lacks of formatting and punctuation while the information are distributed over multiple sentences. We investigate on the use of numerical statistic or topic modeling methods on a real dataset containing multi-part dialogue texts. We evaluate our experiments with respect to the summaries provided in the dataset.Nous nous intéressons dans cet article à l'extraction de thèmes à partir de retranscriptions textuelles de réunions. Ce type de corpus est bruité, il manque de formatage, il est peu structuré avec plusieurs locuteurs qui interviennent et l'information y est souvent éparpillée. Nous présentons une étude expérimentale utilisant des méthodes fondées sur la mesure tf-idf et l'extraction de topics sur un corpus réel de référence (le corpus AMI) pour l'étude de réunions. Nous comparons nos résultats avec les résumés fournis par le corpus

INRIA a CCSD electronic archive server

Discovering useful information in data mining

Author: Crémilleux Bruno
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienc

HAL - Normandie Université

Extraction de connaissances dans les bases de données, quelques spécificités des données médicales

Author: Crémilleux Bruno
Publication venue: Elsevier Masson
Publication date: 01/01/2006
Field of study

@article{RN-CREMILLEUX-2006-2, author = {Crémilleux, B.}, title = {Extraction de connaissances dans les bases de données, quelques spécificités des données médicales}, journal = {Neurophysiologie clinique}, volume = {36}, number = {2}, pages = {46}, year = {2006}, note = {Elsevier. Résumé étendu} }National audienc

HAL Descartes

Extraction de connaissances dans les bases de données, quelques spécificités des données médicales

Author: Crémilleux Bruno
Publication venue: Elsevier Masson
Publication date: 01/01/2006
Field of study

HAL - Normandie Université

HAL Descartes