3,517 research outputs found
On the Usability of Probably Approximately Correct Implication Bases
We revisit the notion of probably approximately correct implication bases
from the literature and present a first formulation in the language of formal
concept analysis, with the goal to investigate whether such bases represent a
suitable substitute for exact implication bases in practical use-cases. To this
end, we quantitatively examine the behavior of probably approximately correct
implication bases on artificial and real-world data sets and compare their
precision and recall with respect to their corresponding exact implication
bases. Using a small example, we also provide qualitative insight that
implications from probably approximately correct bases can still represent
meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph
Quantitative Redundancy in Partial Implications
We survey the different properties of an intuitive notion of redundancy, as a
function of the precise semantics given to the notion of partial implication.
The final version of this survey will appear in the Proceedings of the Int.
Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201
Characterizing covers of functional dependencies using FCA
Functional dependencies (FDs) can be used for various important operations
on data, for instance, checking the consistency and the quality of a
database (including databases that contain complex data). Consequently, a generic framework that allows mining a sound, complete, non-redundant and yet compact set of FDs is an important tool for many different applications. There are different definitions of such sets of FDs (usually called cover).
In this paper, we present the characterization of two different kinds of covers for FDs in terms of pattern structures. The convenience of such a characterization is that it allows an easy implementation of efficient mining algorithms which can later be easily adapted to other kinds of similar dependencies. Finally, we present empirical evidence that the proposed approach can perform better than state-ofthe-art FD miner algorithms in large databases.Peer ReviewedPostprint (published version
Discovery of the D-basis in binary tables based on hypergraph dualization
Discovery of (strong) association rules, or implications, is an important
task in data management, and it nds application in arti cial intelligence,
data mining and the semantic web. We introduce a novel approach
for the discovery of a speci c set of implications, called the D-basis, that provides
a representation for a reduced binary table, based on the structure of
its Galois lattice. At the core of the method are the D-relation de ned in
the lattice theory framework, and the hypergraph dualization algorithm that
allows us to e ectively produce the set of transversals for a given Sperner hypergraph.
The latter algorithm, rst developed by specialists from Rutgers
Center for Operations Research, has already found numerous applications in
solving optimization problems in data base theory, arti cial intelligence and
game theory. One application of the method is for analysis of gene expression
data related to a particular phenotypic variable, and some initial testing is
done for the data provided by the University of Hawaii Cancer Cente
Discovery of the D-basis in binary tables based on hypergraph dualization
Discovery of (strong) association rules, or implications, is an important
task in data management, and it nds application in arti cial intelligence,
data mining and the semantic web. We introduce a novel approach
for the discovery of a speci c set of implications, called the D-basis, that provides
a representation for a reduced binary table, based on the structure of
its Galois lattice. At the core of the method are the D-relation de ned in
the lattice theory framework, and the hypergraph dualization algorithm that
allows us to e ectively produce the set of transversals for a given Sperner hypergraph.
The latter algorithm, rst developed by specialists from Rutgers
Center for Operations Research, has already found numerous applications in
solving optimization problems in data base theory, arti cial intelligence and
game theory. One application of the method is for analysis of gene expression
data related to a particular phenotypic variable, and some initial testing is
done for the data provided by the University of Hawaii Cancer Cente
Experimental Study of Concise Representations of Concepts and Dependencies
In this paper we are interested in studying concise representations of
concepts and dependencies, i.e., implications and association rules. Such
representations are based on equivalence classes and their elements, i.e.,
minimal generators, minimum generators including keys and passkeys, proper
premises, and pseudo-intents. All these sets of attributes are significant and
well studied from the computational point of view, while their statistical
properties remain to be studied. This is the purpose of this paper to study
these singular attribute sets and in parallel to study how to evaluate the
complexity of a dataset from an FCA point of view. In the paper we analyze the
empirical distributions and the sizes of these particular attribute sets. In
addition we propose several measures of data complexity, such as
distributivity, linearity, size of concepts, size of minimum generators, for
the analysis of real-world and synthetic datasets
Learning Terminological Knowledge with High Confidence from Erroneous Data
Description logics knowledge bases are a popular approach to represent terminological and assertional knowledge suitable for computers to work with. Despite that, the practicality of description logics is impaired by the difficulties one has to overcome to construct such knowledge bases. Previous work has addressed this issue by providing methods to learn valid terminological knowledge from data, making use of ideas from formal concept analysis.
A basic assumption here is that the data is free of errors, an assumption that can in general not be made for practical applications. This thesis presents extensions of these results that allow to handle errors in the data. For this, knowledge that is "almost valid" in the data is retrieved, where the notion of "almost valid" is formalized using the notion of confidence from data mining. This thesis presents two algorithms which achieve this retrieval. The first algorithm just extracts all almost valid knowledge from the data, while the second algorithm utilizes expert interaction to distinguish errors from rare but valid counterexamples
- …