5 research outputs found
Rough matroids based on coverings
The introduction of covering-based rough sets has made a substantial
contribution to the classical rough sets. However, many vital problems in rough
sets, including attribution reduction, are NP-hard and therefore the algorithms
for solving them are usually greedy. Matroid, as a generalization of linear
independence in vector spaces, it has a variety of applications in many fields
such as algorithm design and combinatorial optimization. An excellent
introduction to the topic of rough matroids is due to Zhu and Wang. On the
basis of their work, we study the rough matroids based on coverings in this
paper. First, we investigate some properties of the definable sets with respect
to a covering. Specifically, it is interesting that the set of all definable
sets with respect to a covering, equipped with the binary relation of inclusion
, constructs a lattice. Second, we propose the rough matroids based
on coverings, which are a generalization of the rough matroids based on
relations. Finally, some properties of rough matroids based on coverings are
explored. Moreover, an equivalent formulation of rough matroids based on
coverings is presented. These interesting and important results exhibit many
potential connections between rough sets and matroids.Comment: 15page
Rough Set Based Rule Evaluations and Their Applications
Knowledge discovery is an important process in data analysis, data
mining and machine learning. Typically knowledge is presented in the
form of rules. However, knowledge discovery systems often generate a
huge amount of rules. One of the challenges we face is how to
automatically discover interesting and meaningful knowledge from
such discovered rules. It is infeasible for human beings to select
important and interesting rules manually. How to provide a measure
to evaluate the qualities of rules in order to facilitate the
understanding of data mining results becomes our focus. In this
thesis, we present a series of rule evaluation techniques for the
purpose of facilitating the knowledge understanding process. These
evaluation techniques help not only to reduce the number of rules,
but also to extract higher quality rules. Empirical studies on both
artificial data sets and real world data sets demonstrate how such
techniques can contribute to practical systems such as ones for
medical diagnosis and web personalization.
In the first part of this thesis, we discuss several rule evaluation
techniques that are proposed towards rule postprocessing. We show
how properly defined rule templates can be used as a rule evaluation
approach. We propose two rough set based measures, a Rule Importance
Measure, and a Rules-As-Attributes Measure,
%a measure of considering rules as attributes,
to rank the important and interesting rules. In the second part of
this thesis, we show how data preprocessing can help with rule
evaluation. Because well preprocessed data is essential for
important rule generation, we propose a new approach for processing
missing attribute values for enhancing the generated rules. In the
third part of this thesis, a rough set based rule evaluation system
is demonstrated to show the effectiveness of the measures proposed
in this thesis. Furthermore, a new user-centric web personalization
system is used as a case study to demonstrate how the proposed
evaluation measures can be used in an actual application
Metody stosowania wiedzy dziedzinowej do poprawiania jakości klasyfikatorów
The dissertation deals with methods that allow the use of domain knowledge to improve the quality of classifiers, where quality improvement concerns: feature extraction methods, classifier construction methods, and methods for predicting decision values for new objects. In particular the following methods have been proposed to improve the quality of classifiers: the expert features (attributes) defined using domain knowledge expressed in a language that uses the temporal logic, a new method of measuring the quality of cuts during supervised discretization using a matrix of the distances between decision attribute values defined by a domain knowledge, a new decision tree that uses redundant cuts to verify the partition of a tree node, a new method for determination of similarities between objects (e.g. patients) using an ontology defined by an expert with its application to the k-nearest neighbors classifier construction and a new method for generating cross rules describing the effect of a factor interfering perception based on a classifier.
All of the aforementioned methods have been implemented in the CommoDM software library, which is one of the RSES-lib library extensions.
Implemented methods have been tested on real data sets. These were comparative data sets known from the literature as well as own medical data sets collected during the preparation of the dissertation. The latter data sets are associated with the medical aspect of the dissertation that deals with the support of treatment of patients with stable ischemic heart disease, and the main medical problem considered in the thesis is the problem of predicting the presence of significant coronary artery stenosis based on non-invasive heart monitoring by Holter method.
The results of experiments confirm the effectiveness of the application of additional domain knowledge in the task of creating and testing classifiers, because after the application of new methods the quality of classifiers has increased considerably. At the same time, the clinical interpretation of the results is more consistent with medical knowledge.
The research has been supported by the grant DEC-2013/09/B/ST6/01568 and the grant DEC-2013/09/B/NZ5/00758, both from the National Science Centre of the Republic of Poland. Their results were published in 10 publications, including 3 publications in journals from the A list of the Polish Ministry of Science and Higher Education, 3 publications indexed in the Web of Science, one chapter in a monograph and 3 post-conference publications