7 research outputs found

    FCA2VEC: Embedding Techniques for Formal Concept Analysis

    Full text link
    Embedding large and high dimensional data into low dimensional vector spaces is a necessary task to computationally cope with contemporary data sets. Superseding latent semantic analysis recent approaches like word2vec or node2vec are well established tools in this realm. In the present paper we add to this line of research by introducing fca2vec, a family of embedding techniques for formal concept analysis (FCA). Our investigation contributes to two distinct lines of research. First, we enable the application of FCA notions to large data sets. In particular, we demonstrate how the cover relation of a concept lattice can be retrieved from a computational feasible embedding. Secondly, we show an enhancement for the classical node2vec approach in low dimension. For both directions the overall constraint of FCA of explainable results is preserved. We evaluate our novel procedures by computing fca2vec on different data sets like, wiki44 (a dense part of the Wikidata knowledge graph), the Mushroom data set and a publication network derived from the FCA community.Comment: 25 page

    Découverte de définitions dans le web des données

    Get PDF
    In this thesis, we are interested in the web of data and knowledge units that can be possibly discovered inside. The web of data can be considered as a very large graph consisting of connected RDF triple databases. An RDF triple, denoted as (subject, predicate, object), represents a relation (i.e. the predicate) existing between two resources (i.e. the subject and the object). Resources can belong to one or more classes, where a class aggregates resources sharing common characteristics. Thus, these RDF triple databases can be seen as interconnected knowledge bases. Most of the time, these knowledge bases are collaboratively built thanks to human users. This is particularly the case of DBpedia, a central knowledge base within the web of data, which encodes Wikipedia content in RDF format. DBpedia is built from two types of Wikipedia data: on the one hand, (semi-)structured data such as infoboxes, and, on the other hand, categories, which are thematic clusters of manually generated pages. However, the semantics of categories in DBpedia, that is, the reason a human agent has bundled resources, is rarely made explicit. In fact, considering a class, a software agent has access to the resources that are regrouped together, i.e. the class extension, but it generally does not have access to the ``reasons'' underlying such a cluster, i.e. it does not have the class intension. Considering a category as a class of resources, we aim at discovering an intensional description of the category. More precisely, given a class extension, we are searching for the related intension. The pair (extension, intension) which is produced provides the final definition and the implementation of classification-based reasoning for software agents. This can be expressed in terms of necessary and sufficient conditions: if x belongs to the class C, then x has the property P (necessary condition), and if x has the property P, then it belongs to the class C (sufficient condition). Two complementary data mining methods allow us to materialize the discovery of definitions, the search for association rules and the search for redescriptions. In this thesis, we first present a state of the art about association rules and redescriptions. Next, we propose an adaptation of each data mining method for the task of definition discovery. Then we detail a set of experiments applied to DBpedia, and we qualitatively and quantitatively compare the two approaches. Finally, we discuss how discovered definitions can be added to DBpedia to improve its quality in terms of consistency and completeness.Dans cette thèse, nous nous intéressons au web des données et aux ``connaissances'' que potentiellement il renferme. Le web des données se présente comme un très grand graphe constitué de bases de triplets RDF connectées entre elles. Un triplet RDF, dénoté (sujet, prédicat, objet), représente une relation (le prédicat) qui existe entre deux ressources (le sujet et l'objet). Les ressources peuvent appartenir à une ou plusieurs classes, où une classe regroupe des ressources partageant des caractéristiques communes. Ainsi, ces bases de triplets RDF peuvent être vues comme des bases de connaissances interconnectées. La plupart du temps ces bases de connaissances sont construites de manière collaborative par des utilisateurs. C'est notamment le cas de DBpedia, une base de connaissances centrale dans le web des données, qui encode le contenu de Wikipédia au format RDF. DBpedia est construite à partir de deux types de données de Wikipédia : d'une part, des données (semi-)structurées telles que les infoboxes et d'autre part les catégories, qui sont des regroupements thématiques de pages générés manuellement. Cependant, la sémantique des catégories dans DBpedia, c'est-à-dire la raison pour laquelle un agent humain a regroupé des ressources, n'est pas explicite. De fait, en considérant une classe, un agent logiciel a accès aux ressources qui y sont regroupées --- il dispose de la définition dite en extension --- mais il n'a généralement pas accès aux ``motifs'' de ce regroupement --- il ne dispose pas de la définition dite en intension. Dans cette thèse, nous cherchons à associer une définition à une catégorie en l'assimilant à une classe de ressources. Plus précisément, nous cherchons à associer une intension à une classe donnée en extension. La paire (extension, intension) produite va fournir la définition recherchée et va autoriser la mise en œuvre d'un raisonnement par classification pour un agent logiciel. Cela peut s'exprimer en termes de conditions nécessaires et suffisantes : si x appartient à la classe C, alors x a la propriété P (condition nécessaire), et si x a la propriété P, alors il appartient à la classe C (condition suffisante). Deux méthodes de fouille de données complémentaires nous permettent de matérialiser la découverte de définitions, la fouille de règles d'association et la fouille de redescriptions. Dans le mémoire, nous présentons d'abord un état de l'art sur les règles d'association et les redescriptions. Ensuite, nous proposons une adaptation de chacune des méthodes pour finaliser la tâche de découverte de définitions. Puis nous détaillons un ensemble d'expérimentations menées sur DBpedia, où nous comparons qualitativement et quantitativement les deux approches. Enfin les définitions découvertes peuvent potentiellement être ajoutées à DBpedia pour améliorer sa qualité en termes de cohérence et de complétud

    Eighth International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at ECAI 2020)

    Get PDF
    International audienceProceedings of the 8th International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2020)co-located with 24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain, August 29, 202

    Decisioning 2022 : Collaboration in knowledge discovery and decision making: Applications to sustainable agriculture

    Get PDF
    Sustainable agriculture is one of the Sustainable Development Goals (SDG) proposed by UN (United Nations), but little systematic work on Knowledge Discovery and Decision Making has been applied to it. Knowledge discovery and decision making are becoming active research areas in the last years. The era of FAIR (Findable, Accessible, Interoperable, Reusable) data science, in which linked data with a high degree of variety and different degrees of veracity can be easily correlated and put in perspective to have an empirical and scientific perception of best practices in sustainable agricultural domain. This requires combining multiple methods such as elicitation, specification, validation, technologies from semantic web, information retrieval, formal concept analysis, collaborative work, semantic interoperability, ontological matching, specification, smart contracts, and multiple decision making. Decisioning 2022 is the first workshop on Collaboration in knowledge discovery and decision making: Applications to sustainable agriculture. It has been organized by six research teams from France, Argentina, Colombia and Chile, to explore the current frontier of knowledge and applications in different areas related to knowledge discovery and decision making. The format of this workshop aims at the discussion and knowledge exchange between the academy and industry members.Laboratorio de Investigación y Formación en Informática Avanzad

    Models and Modelling between Digital and Humanities: A Multidisciplinary Perspective

    Get PDF
    This Supplement of Historical Social Research stems from the contributions on the topic of modelling presented at the workshop “Thinking in Practice”, held at Wahn Manor House in Cologne on January 19-20, 2017. With Digital Humanities as starting point, practical examples of model building from different disciplines are considered, with the aim of contributing to the dialogue on modelling from several perspectives. Combined with theoretical considerations, this collection illustrates how the process of modelling is one of coming to know, in which the purpose of each modelling activity and the form in which models are expressed has to be taken into consideration in tandem. The modelling processes presented in this volume belong to specific traditions of scholarly and practical thinking as well as to specific contexts of production and use of models. The claim that supported the project workshop was indeed that establishing connections between different traditions of and approaches toward modelling is vital, whether these connections are complementary or intersectional. The workshop proceedings address an underpinning goal of the research project itself, namely that of examining the nature of the epistemological questions in the different traditions and how they relate to the nature of the modelled objects and the models being created. This collection is an attempt to move beyond simple representational views on modelling in order to understand modelling processes as scholarly and cultural phenomena as such
    corecore