Search CORE

39 research outputs found

Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures

Author: Grissa Dhouha
Guillaume Sylvie
Nguifo Engelbert Mephu
Publication venue
Publication date: 01/01/2010
Field of study

Formal Concept Analysis "FCA" is a data analysis method which enables to discover hidden knowledge existing in data. A kind of hidden knowledge extracted from data is association rules. Different quality measures were reported in the literature to extract only relevant association rules. Given a dataset, the choice of a good quality measure remains a challenging task for a user. Given a quality measures evaluation matrix according to semantic properties, this paper describes how FCA can highlight quality measures with similar behavior in order to help the user during his choice. The aim of this article is the discovery of Interestingness Measures "IM" clusters, able to validate those found due to the hierarchical and partitioning clustering methods "AHC" and "k-means". Then, based on the theoretical study of sixty one interestingness measures according to nineteen properties, proposed in a recent study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

HAL Clermont Université

Hal-Diderot

Discovering Implicational Knowledge in Wikidata

Author: B Ganter
D Borchmann
D Vrandečić
Fariz Darari
G Stumme
L Galárraga
M Luxenburger
S Rudolph
T Pellissier Tanon
VT Ho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/02/2019
Field of study

Knowledge graphs have recently become the state-of-the-art tool for representing the diverse and complex knowledge of the world. Examples include the proprietary knowledge graphs of companies such as Google, Facebook, IBM, or Microsoft, but also freely available ones such as YAGO, DBpedia, and Wikidata. A distinguishing feature of Wikidata is that the knowledge is collaboratively edited and curated. While this greatly enhances the scope of Wikidata, it also makes it impossible for a single individual to grasp complex connections between properties or understand the global impact of edits in the graph. We apply Formal Concept Analysis to efficiently identify comprehensible implications that are implicitly present in the data. Although the complex structure of data modelling in Wikidata is not amenable to a direct approach, we overcome this limitation by extracting contextual representations of parts of Wikidata in a systematic fashion. We demonstrate the practical feasibility of our approach through several experiments and show that the results may lead to the discovery of interesting implicational knowledge. Besides providing a method for obtaining large real-world data sets for FCA, we sketch potential applications in offering semantic assistance for editing and curating Wikidata

arXiv.org e-Print Archive

Crossref

An approach for social interest detection

Author: Amous Ikram
Mezghani Manel
Sèdes Florence
Publication venue: Université de Nantes
Publication date: 01/01/2013
Field of study

In this article, we propose a new technique of interests detection by analyzing the accuracy of the tagging behaviour of each user in order to figure out the tags which reflect actually content of the resources. Our approach has been tested and evaluated in the Delicious social database

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Hal-Diderot

Data Management in the APPA System

Author: Akbarinia Reza
Martins Vidal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

International audienceCombining Grid and P2P technologies can be exploited to provide high-level data sharing in large-scale distributed environments. However, this combination must deal with two hard problems: the scale of the network and the dynamic behavior of the nodes. In this paper, we present our solution in APPA (Atlas Peer-to-Peer Architecture), a data management system with high-level services for building large-scale distributed applications. We focus on data availability and data discovery which are two main requirements for implementing large-scale Grids. We have validated APPA's services through a combination of experimentation over Grid5000, which is a very large Grid experimental platform, and simulation using SimJava. The results show very good performance in terms of communication cost and response time

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

GIS Databases: From Multiscale to MultiRepresentation

Author: Parent Christine
Spaccapietra Stefano
Vangenot Christelle
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/01/2007
Field of study

Cartography is one of the major application areas using geographical databases. Whether it is for the business of producing paper maps for sale, or whether it is for displaying maps on a screen to visualize the result of a query, we need computer systems that know how to represent the same geographical area at different scales. The concept of multiscale database has become popular in the GIS domain as a way to enforce consistency between representations and reduce the global update load. Scaling, however, is just one of the facets that may lead to keeping several representations for the same real-world object. Viewpoint and classification are two major abstracttractions in the design process that also generate multiple representations. This paper investigates the generic issues and solutions to achieve flexible support of multiple representation in a GIS database

Infoscience - École polytechnique fédérale de Lausanne

Sémantique et composition des règles d'adaptation d'un système de raisonnement à partir de cas - Vers la construction d'une base de règles d'adaptation

Author: Tixier Matthieu
Publication venue: HAL CCSD
Publication date: 26/06/2007
Field of study

L'acquisition de connaissances d'adaptation (aca) , notamment dans sa dimension automatique, ouvre de grandes perspectives pour l'étape critique en raisonnement à partir de cas (RàPC) que constitue l'adaptation. CabamakA s'appuie sur les méthodes symboliques de fouille de données pour extraire des généralisations sur les variations de propriétés entre cas. Cette recherche propose d'étudier la structure de l'ensemble des règles d'adaptation dans l'optique de construire une base. Notre base est un ensemble minimal qui clos l'ensemble des règles extraites sous l'opération de composition. Réduire le nombre de règles d'adaptation à soumettre en validation à l'analyste est stratégique afin de réduire le coût de développement des systèmes de RàPC. L'expertise humaine est essentielle pour permettre l'exploitation de ces connaissances d'adaptation dans le cadre projet Kasimir pour la gestion des connaissances décisionnelles en oncologie. Ce travail présente la sémantique des règles d'adaptation et leur composition en vue de construire une base. Plusieurs perspectives pour l'implémentation et l'amélioration de cette base sont proposées

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Contributions à la recherche d'information dans des systèmes distribués, ouverts, intégrant des participants autonomes

Author: Lamarre Philippe
Publication venue: HAL CCSD
Publication date: 27/11/2009
Field of study

Les travaux que nous présentons sont relatifs à la problématique de la recherche d'information dans des systèmes dont les traits caractéristiques sont la distribution à très large échelle, l'ouverture, et l'autonomie des participants. Nous nous sommes plus particulièrement intéressé à des solutions facilitant l'intégration des participants et s'adaptant dynamiquement à leurs attentes. Nos travaux s'articulent au tour de trois axes : la définition d'une architecture distribuée, l'allocation de requêtes, et le traitement de l'hétérogénéité sémantique. Nous avons d'abord proposé une architecture totalement distribuée organisée en communautés thématiques. Cette vision sémantique de l'organisation, combinée à une politique qui consiste à s'appuyer non seulement sur les ressource des participants, mais aussi sur leurs compétences, permet de router les requêtes et les réponses dans le système en évitant de maintenir d'un index général tel que pratiqué par les moteurs de recherche. Un système ainsi distribué pose rapidement le problème de l'allocation des requêtes. En effet, tous les fournisseurs d'information ne disposent pas de ressources leur permettant de traiter le très grand nombre de requêtes émises. Laisser les participants choisir les requêtes qu'ils traitent répond aux attentes des fournisseurs. Cependant, cela entraine que certaines requêtes ne sont pas traitées pour des raisons individuelles, ce qui ne correspond pas au comportement qu'attendent les utilisateurs. Nous avons donc exploré la piste consistant à tenir compte des intentions des participants tout en allouant autoritairement les requêtes si nécessaire. Nous avons d'abord proposé une médiation flexible utilisant des aspects monétaires. Puis, nous avons mené une étude concernant la satisfaction des participants où nous avons dégagé un certain nombre de notions : satisfaction, satisfaction par rapport au système d'allocation, adéquation d'un participant par rapport au système, adéquation du système par rapport à un participant, etc. Nous avons alors proposé une deuxième technique d'allocation, SbQA, directement basée sur la notion de satisfaction. Enfin, de par leur nature, les systèmes distribués ouverts intègrent des participants provenant d'horizons différents ce qui est propice à l'hétérogénéité sémantique. Dans le cadre de la recherche d'information et des vecteurs sémantiques, nous avons proposé une méthode qui utilise non seulement les alignements entre ontologies mais aussi un mécanisme «d'explication» et «d'interprétation» pour améliorer l'interopérabilité sémantique