Search CORE

3,517 research outputs found

On the Usability of Probably Approximately Correct Implication Bases

Author: B Ganter
B Ganter
D Angluin
D Angluin
G Stumme
H Kautz
J-L Guigues
K Adaricheva
LG Valiant
M Arias
S Obiedkov
SO Kuznetsov
U Ryssel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/01/2017
Field of study

We revisit the notion of probably approximately correct implication bases from the literature and present a first formulation in the language of formal concept analysis, with the goal to investigate whether such bases represent a suitable substitute for exact implication bases in practical use-cases. To this end, we quantitatively examine the behavior of probably approximately correct implication bases on artificial and real-world data sets and compare their precision and recall with respect to their corresponding exact implication bases. Using a small example, we also provide qualitative insight that implications from probably approximately correct bases can still represent meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph

arXiv.org e-Print Archive

Crossref

Quantitative Redundancy in Partial Implications

Author: Balcázar José L.
Publication venue
Publication date: 01/01/2015
Field of study

We survey the different properties of an intuitive notion of redundancy, as a function of the precise semantics given to the notion of partial implication. The final version of this survey will appear in the Proceedings of the Int. Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Characterizing covers of functional dependencies using FCA

Author: Baixeries i Juvillà Jaume
Codocedo Victor
Kaytoue Mehdi
Napoli Amedeo
Publication venue
Publication date: 01/01/2018
Field of study

Functional dependencies (FDs) can be used for various important operations on data, for instance, checking the consistency and the quality of a database (including databases that contain complex data). Consequently, a generic framework that allows mining a sound, complete, non-redundant and yet compact set of FDs is an important tool for many different applications. There are different definitions of such sets of FDs (usually called cover). In this paper, we present the characterization of two different kinds of covers for FDs in terms of pattern structures. The convenience of such a characterization is that it allows an easy implementation of efficient mining algorithms which can later be easily adapted to other kinds of similar dependencies. Finally, we present empirical evidence that the proposed approach can perform better than state-ofthe-art FD miner algorithms in large databases.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Discovery of the D-basis in binary tables based on hypergraph dualization

Author: Adaricheva Kira
Nation J.B.
Publication venue
Publication date: 01/01/2016
Field of study

Discovery of (strong) association rules, or implications, is an important task in data management, and it nds application in arti cial intelligence, data mining and the semantic web. We introduce a novel approach for the discovery of a speci c set of implications, called the D-basis, that provides a representation for a reduced binary table, based on the structure of its Galois lattice. At the core of the method are the D-relation de ned in the lattice theory framework, and the hypergraph dualization algorithm that allows us to e ectively produce the set of transversals for a given Sperner hypergraph. The latter algorithm, rst developed by specialists from Rutgers Center for Operations Research, has already found numerous applications in solving optimization problems in data base theory, arti cial intelligence and game theory. One application of the method is for analysis of gene expression data related to a particular phenotypic variable, and some initial testing is done for the data provided by the University of Hawaii Cancer Cente

arXiv.org e-Print Archive

Nazarbayev University Repository

Discovery of the D-basis in binary tables based on hypergraph dualization

Author: Adaricheva Kira
Nation J.B.
Publication venue
Publication date: 01/01/2016
Field of study

Nazarbayev University Repository

Experimental Study of Concise Representations of Concepts and Dependencies

Author: Buzmakov Aleksey
Dudyrev Egor
Kuznetsov Sergei O.
Makhalova Tatiana
Napoli Amedeo
Publication venue
Publication date: 20/06/2022
Field of study

In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the computational point of view, while their statistical properties remain to be studied. This is the purpose of this paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of these particular attribute sets. In addition we propose several measures of data complexity, such as distributivity, linearity, size of concepts, size of minimum generators, for the analysis of real-world and synthetic datasets

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Learning Terminological Knowledge with High Confidence from Erroneous Data

Author: Borchmann Daniel
Publication venue
Publication date: 09/09/2014
Field of study

Description logics knowledge bases are a popular approach to represent terminological and assertional knowledge suitable for computers to work with. Despite that, the practicality of description logics is impaired by the difficulties one has to overcome to construct such knowledge bases. Previous work has addressed this issue by providing methods to learn valid terminological knowledge from data, making use of ideas from formal concept analysis. A basic assumption here is that the data is free of errors, an assumption that can in general not be made for practical applications. This thesis presents extensions of these results that allow to handle errors in the data. For this, knowledge that is "almost valid" in the data is retrieved, where the notion of "almost valid" is formalized using the notion of confidence from data mining. This thesis presents two algorithms which achieve this retrieval. The first algorithm just extracts all almost valid knowledge from the data, while the second algorithm utilizes expert interaction to distinguish errors from rare but valid counterexamples

Technische Universität Dresden: Qucosa

Measuring implications of the D-basis in biomedical applications

Author: Adaricheva Kira
Publication venue
Publication date: 26/06/2015
Field of study

Nazarbayev University Repository