189 research outputs found

    Link Prediction via Community Detection in Bipartite Multi-Layer Graphs

    Get PDF
    International audienceThe growing number of multi-relational networks pose new challenges concerning the development of methods for solving classical graph problems in a multi-layer framework, such as link prediction. In this work, we combine an existing bipartite local models method with approaches for link prediction from communities to address the link prediction problem in multi-layer graphs. To this end, we extend existing community detection-based link prediction measures to the bipartite multi-layer network setting. We obtain a new generic framework for link prediction in bipartite multi-layer graphs, which can integrate any community detection approach, is capable of handling an arbitrary number of networks, rather inexpensive (depending on the community detection technique), and able to automatically tune its parameters. We test our framework using two of the most common community detection methods, the Louvain algorithm and spectral partitioning, which can be easily applied to bipartite multi-layer graphs. We evaluate our approach on benchmark data sets for solving a common drug-target interaction prediction task in computational drug design and demonstrate experimentally that our approach is competitive with the state-of-the-art

    Discovering Knowledge using a Constraint-based Language

    Full text link
    Discovering pattern sets or global patterns is an attractive issue from the pattern mining community in order to provide useful information. By combining local patterns satisfying a joint meaning, this approach produces patterns of higher level and thus more useful for the data analyst than the usual local patterns, while reducing the number of patterns. In parallel, recent works investigating relationships between data mining and constraint programming (CP) show that the CP paradigm is a nice framework to model and mine such patterns in a declarative and generic way. We present a constraint-based language which enables us to define queries addressing patterns sets and global patterns. The usefulness of such a declarative approach is highlighted by several examples coming from the clustering based on associations. This language has been implemented in the CP framework.Comment: 12 page

    Sequence Classification Based on Delta-Free Sequential Pattern

    Get PDF
    International audienceSequential pattern mining is one of the most studied and challenging tasks in data mining. However, the extension of well-known methods from many other classical patterns to sequences is not a trivial task. In this paper we study the notion of δ-freeness for sequences. While this notion has extensively been discussed for itemsets, this work is the first to extend it to sequences. We define an efficient algorithm devoted to the extraction of δ-free sequential patterns. Furthermore, we show the advantage of the δ-free sequences and highlight their importance when building sequence classifiers, and we show how they can be used to address the feature selection problem in statistical classifiers, as well as to build symbolic classifiers which optimizes both accuracy and earliness of predictions

    Fouille de motifs séquentiels pour la découverte de relations entre gènes et maladies rares

    Get PDF
    National audienceOrphanet est un organisme dont l'objectif est notamment de rassembler des collections d'articles traitant de maladies rares. Cependant, l'acquisition de nouvelles connaissances dans ce domaine est actuellement réalisée manuellement. Dès lors, obtenir de nouvelles informations relatives aux maladies rares est un processus chronophage. Permettre d'obtenir ces informations de manière automatique est donc un enjeu important. Dans ce contexte, nous proposons d'aborder la question de l'extraction de relations entre gènes et maladies rares en utilisant des approches de fouille de données, plus particulièrement de fouille de motifs séquentiels sous contraintes. Nos expérimentations montrent l'intérêt de notre approche pour l'extraction de relations entre gènes et maladies rares à partir de résumés d'articles de PubMe

    Fouille de données séquentielles pour l'extraction d'information dans les textes

    Get PDF
    International audienceCet article montre l'intérêt d'utiliser les motifs issus des méthodes de fouille de données dans le domaine du TAL appliqué à la biologie médicale et génétique, et plus particulièrement dans les tâches d'extraction d'information. Nous proposons une approche pour apprendre les patrons linguistiques par une méthode de fouille de données fondée sur les motifs séquentiels et sur une fouille dite récursive des motifs eux-mêmes. Une originalité de notre approche est de s'affranchir de l'analyse syntaxique tout en permettant de produire des résultats symboliques, intelligibles pour l'utilisateur, a contrario des méthodes numériques qui restent difficilement interprétables. Elle ne nécessite pas de ressources linguistiques autres que le corpus d'apprentissage. Pour la reconnaissance d'entités biologiques nommées, nous proposons une méthode fondée sur un nouveau type de motifs intégrant une séquence et son contexte. This paper shows the benefit of using data mining methods for Biological Natural Language Processing. A method for discovering linguistic patterns based on a recursive sequential pattern mining is proposed. It does not require a sentence parsing nor other resource except a training data set. It produces understandable results and we show its interest in the extraction of relations between named entities. For the named entities recognition problem, we propose a method based on a new kind of patterns taking account the sequence and its context

    Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts

    No full text
    International audienceBackgroundDiscovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user.ResultsWe take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed.ConclusionsExperiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/ webcite. The software is available at https://bingo2.greyc.fr/?q=node/22 webcite

    Étude Expérimentale d'Extraction d'Information dans des Retranscriptions de Réunions

    Get PDF
    National audienceAn Experimental Approach For Information Extraction in Multi-Party Dialogue Discourse In this paper, we address the task of information extraction for meeting transcripts. The meeting documents are not usually well-structured and lacks of formatting and punctuation while the information are distributed over multiple sentences. We investigate on the use of numerical statistic or topic modeling methods on a real dataset containing multi-part dialogue texts. We evaluate our experiments with respect to the summaries provided in the dataset.Nous nous intéressons dans cet article à l'extraction de thèmes à partir de retranscriptions textuelles de réunions. Ce type de corpus est bruité, il manque de formatage, il est peu structuré avec plusieurs locuteurs qui interviennent et l'information y est souvent éparpillée. Nous présentons une étude expérimentale utilisant des méthodes fondées sur la mesure tf-idf et l'extraction de topics sur un corpus réel de référence (le corpus AMI) pour l'étude de réunions. Nous comparons nos résultats avec les résumés fournis par le corpus

    Discovering useful information in data mining

    No full text
    International audienc

    Extraction de connaissances dans les bases de données, quelques spécificités des données médicales

    No full text
    @article{RN-CREMILLEUX-2006-2, author = {Crémilleux, B.}, title = {Extraction de connaissances dans les bases de données, quelques spécificités des données médicales}, journal = {Neurophysiologie clinique}, volume = {36}, number = {2}, pages = {46}, year = {2006}, note = {Elsevier. Résumé étendu} }National audienc

    Extraction de connaissances dans les bases de données, quelques spécificités des données médicales

    No full text
    @article{RN-CREMILLEUX-2006-2, author = {Crémilleux, B.}, title = {Extraction de connaissances dans les bases de données, quelques spécificités des données médicales}, journal = {Neurophysiologie clinique}, volume = {36}, number = {2}, pages = {46}, year = {2006}, note = {Elsevier. Résumé étendu} }National audienc
    • …
    corecore