Search CORE

109 research outputs found

Fouille de données pour associer des noms de sessions aux articles scientifiques

Author: Cellier Peggy
Charnois Thierry
Quiniou Solen
Publication venue: HAL CCSD
Publication date: 01/07/2014
Field of study

National audienceIn this paper, we present a proposition based on data mining to tackle the DEFT 2014 challenge. We focus on task 4 which consists of identifying the right conference session for scientific papers. The proposed approach is based on a combination of two data mining techniques. Sequence mining extracts frequent phrases in scientific papers in order to build paper and session descriptions. Then, those descriptions of papers and sessions are used to create a graph which represents shared descriptions. A graph mining technique is applied on the graph in order to extract a collection of homogenous sub-graphs corresponding to sets of papers associated to sessions.Nous décrivons dans cet article notre participation à l'édition 2014 de DEFT. Nous nous intéressons à la tâche consistant à associer des noms de session aux articles d'une conférence. Pour ce faire, nous proposons une approche originale, symbolique et non supervisée, de découverte de connaissances. L'approche combine des méthodes de fouille de données séquentielles et de fouille de graphes. La fouille de séquences permet d'extraire des motifs fréquents dans le but de construire des descriptions des articles et des sessions. Ces descriptions sont ensuite représentées par un graphe. Une technique de fouille de graphes appliquée sur ce graphe permet d'obtenir des collections de sous-graphes homogènes, correspondant à des collections d'articles et de noms de sessions

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Paris 13

Hal-Diderot

HAL-Rennes 1

KG-MDL: Mining Graph Patterns in Knowledge Graphs with the MDL Principle

Author: Bariatti Francesco
Cellier Peggy
Ferré Sébastien
Publication venue
Publication date: 22/09/2023
Field of study

Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to interpret (pattern explosion). Second, real-life KGs tend to differ from the graphs usually treated in graph mining: they are multigraphs, their vertex degrees tend to follow a power-law, and the way in which they model knowledge can produce spurious patterns. Recently, a graph mining approach named GraphMDL+ has been proposed to tackle the problem of pattern explosion, using the Minimum Description Length (MDL) principle. However, GraphMDL+, like other graph mining approaches, is not suited for KGs without adaptations. In this paper we propose KG-MDL, a graph pattern mining approach based on the MDL principle that, given a KG, generates a human-sized and descriptive set of graph patterns, and so in a parameter-less and anytime way. We report on experiments on medium-sized KGs showing that our approach generates sets of patterns that are both small enough to be interpreted by humans and descriptive of the KG. We show that the extracted patterns highlight relevant characteristics of the data: both of the schema used to create the data, and of the concrete facts it contains. We also discuss the issues related to mining graph patterns on knowledge graphs, as opposed to other types of graph data

arXiv.org e-Print Archive

Graph Mining under Linguistic Constraints to Explore Large Texts

Author: Cellier Peggy
Charnois Thierry
Legallois Dominique
Quiniou Solen
Publication venue: HAL CCSD
Publication date: 24/03/2013
Field of study

https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/view/1529International audienceIn this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to Hoey's linguistic model which allows the selection and the binding of adjacent and non-adjacent sentences. The main contribution of our work consists in proposing a method based on both Hoey's linguistic model and a special graph mining technique, called CoHoP mining, to extract coherent sub-parts of the graph representation of the text. We have conducted some experiments on several English texts showing the interest of the proposed approach

HAL - Normandie Université

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Modeling Complex Structures in Graph-FCA: Illustration on Natural Language Syntax

Author: Cellier Peggy
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 20/06/2022
Field of study

International audienceGraph-FCA is an extension of formal concept analysis for multi-relational data. In this paper, we discuss the freedom of representation offered by Graph-FCA, in particular by its support of n-ary relations, considering natural language syntax as a use case

INRIA a CCSD electronic archive server

GraphMDL Visualizer: Interactive Visualization of Graph Patterns

Author: Bariatti Francesco
Cellier Peggy
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 18/09/2020
Field of study

International audiencePattern mining algorithms allow to extract structures from data to highlight interesting and useful knowledge. However, those approaches can only be truly helpful if the users can actually understand their outputs. Thus, visualization techniques play a great role in pattern mining, bridging the gap between the algorithms and the users. In this demo paper we propose GraphMDL Visualizer, a tool for the interactive visualization of the graph patterns extracted with GraphMDL, a graph mining approach based on the MDL principle. GraphMDL Visualizer is structured according to the behavior and needs of users when they analyze GraphMDL results. The tool has dierent views, ranging from more general (distribution of pattern characteristics), to more specic (visualization of specic patterns). It is also highly interactive, allowing the users to customize the dierent views, and navigate between them, through simple mouse clicks. GraphMDL Visualizer is freely available online

INRIA a CCSD electronic archive server

Building up Shared Knowledge with Logical Information Systems

Author: Cellier Peggy
Ducassé Mireille
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceLogical Information Systems (LIS) are based on Logical Concept Analysis, an extension of Formal Concept Analysis. This paper describes an application of LIS to support group decision. A case study gathered a research team. The objective was to decide on a set of potential conferences on which to send submissions. People individually used Abilis, a LIS web server, to preselect a set of conferences. Starting from 1041 call for papers, the individual participants preselected 63 conferences. They met and collectively used Abilis to select a shared set of 42 target conferences. The team could then sketch a publication planning. The case study provides evidence that LIS cover at least three of the collaboration patterns identified by Kolfschoten, de Vreede and Briggs. Abilis helped the team to build a more complete and relevant set of information (Generate/Gathering pattern); to build a shared understanding of the relevant information (Clarify/Building Shared Understanding); and to quickly reduce the number of target conferences (Reduce/Filtering pattern)

INRIA a CCSD electronic archive server

A Two-Step Approach for Explainable Relation Extraction

Author: Ayats Hugo
Cellier Peggy
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 20/04/2022
Field of study

International audienceKnowledge Graphs (KG) offer easy-to-process information. An important issue to build a KG from texts is the Relation Extraction (RE) task that identifies and labels relationships between entity mentions. In this paper, to address the RE problem, we propose to combine a deep learning approach for relation detection, and a symbolic method for relation classification. It allows to have at the same time the performance of deep learning methods and the interpretability of symbolic methods. This method has been evaluated and compared with state-ofthe-art methods on TACRED, a relation extraction benchmark, and has shown interesting quantitative and qualitative results

INRIA a CCSD electronic archive server

Extracting Relations in Texts with Concepts of Neighbours

Author: Ayats Hugo
Cellier Peggy
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 29/06/2021
Field of study

International audienceDuring the last decade, the need for reliable and massive Knowledge Graphs (KG) increased. KGs can be created in several ways: manually with forms or automatically with Information Extraction (IE), a natural language processing task for extracting knowledge from text. Relation Extraction is the part of IE that focuses on identifying relations between named entities in texts, which amounts to find new edges in a KG. Most recent approaches rely on deep learning, achieving state-ofthe-art performances. However, those performances are still too low to fully automatize the construction of reliable KGs, and human interaction remains necessary. This is made difficult by the statistical nature of deep learning methods that makes their predictions hardly interpretable. In this paper, we present a new symbolic and interpretable approach for Relation Extraction in texts. It is based on a modeling of the lexical and syntactic structure of text as a knowledge graph, and it exploits Concepts of Neighbours, a method based on Graph-FCA for computing similarities in knowledge graphs. An evaluation has been performed on a subset of TACRED (a relation extraction benchmark), showing promising results

INRIA a CCSD electronic archive server

Calcul de réseaux phrastiques pour l'analyse et la navigation textuelle

Author: Cellier Peggy
Charnois Thierry
Legallois Dominique
Publication venue: HAL CCSD
Publication date: 27/06/2011
Field of study

International audienceIn this paper, we present an automatic process based on lexical repetition introduced by Hoey. The application of that kind of approaches on large texts is difficult to do by hand. In the paper, we propose an automatic process to treat large texts. We have conducted some experiments on different kinds of texts (narrative, expositive) to show the benefits of the approach.Le travail présente une méthode de navigation dans les textes, fondée sur la répétition lexicale. La méthode choisie est celle développée par le linguiste Hoey. Son application manuelle à des textes de grandeur conséquente est problématique. Nous proposons dans cet article un processus automatique qui permet d'analyser selon cette méthode des textes de grande taille ; des expériences ont été menées appliquant le processus à différents types de textes (narratif, expositif) et montrant l'intérêt de l'approche

HAL - Normandie Université

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

CONNOR: Exploring Similarities in Graphs with Concepts of Neighbors

Author: Ayats Hugo
Cellier Peggy
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 20/06/2022
Field of study

International audienceSince its first formalization, the Formal Concept Analysis (FCA) field has shown diverse extensions of the FCA paradigm. A recent example is Graph-FCA, an extension of FCA to graphs. In the context of Graph-FCA, a notion of concept of neighbors has been introduced to support a form of nearest neighbor search over the nodes of a graph. Concepts of neighbors have been used for diverse tasks, such as knowledge graph completion and relation classification in texts. In this paper, we present CONNOR, a Java library for the computation of concepts of neighbors on RDF graphs

INRIA a CCSD electronic archive server