18,470 research outputs found
Characteristic of partition-circuit matroid through approximation number
Rough set theory is a useful tool to deal with uncertain, granular and
incomplete knowledge in information systems. And it is based on equivalence
relations or partitions. Matroid theory is a structure that generalizes linear
independence in vector spaces, and has a variety of applications in many
fields. In this paper, we propose a new type of matroids, namely,
partition-circuit matroids, which are induced by partitions. Firstly, a
partition satisfies circuit axioms in matroid theory, then it can induce a
matroid which is called a partition-circuit matroid. A partition and an
equivalence relation on the same universe are one-to-one corresponding, then
some characteristics of partition-circuit matroids are studied through rough
sets. Secondly, similar to the upper approximation number which is proposed by
Wang and Zhu, we define the lower approximation number. Some characteristics of
partition-circuit matroids and the dual matroids of them are investigated
through the lower approximation number and the upper approximation number.Comment: 12 page
A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Mining frequent itemsets is a popular method for finding associated items in
databases. For this method, support, the co-occurrence frequency of the items
which form an association, is used as the primary indicator of the
associations's significance. A single user-specified support threshold is used
to decided if associations should be further investigated. Support has some
known problems with rare items, favors shorter itemsets and sometimes produces
misleading associations.
In this paper we develop a novel model-based frequency constraint as an
alternative to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by applying a
simple stochastic mixture model (the NB model) which allows for transaction
data's typically highly skewed item frequency distribution. A user-specified
precision threshold is used together with the model to find local frequency
thresholds for groups of itemsets. Based on the constraint we develop the
notion of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly available
transaction databases we show that the new constraint provides improvements
over a single minimum support threshold and that the precision threshold is
more robust and easier to set and interpret by the user
Problem-Solving Knowledge Mining from Users’\ud Actions in an Intelligent Tutoring System
In an intelligent tutoring system (ITS), the domain expert should provide\ud
relevant domain knowledge to the tutor so that it will be able to guide the\ud
learner during problem solving. However, in several domains, this knowledge is\ud
not predetermined and should be captured or learned from expert users as well as\ud
intermediate and novice users. Our hypothesis is that, knowledge discovery (KD)\ud
techniques can help to build this domain intelligence in ITS. This paper proposes\ud
a framework to capture problem-solving knowledge using a promising approach\ud
of data and knowledge discovery based on a combination of sequential pattern\ud
mining and association rules discovery techniques. The framework has been implemented\ud
and is used to discover new meta knowledge and rules in a given domain\ud
which then extend domain knowledge and serve as problem space allowing\ud
the intelligent tutoring system to guide learners in problem-solving situations.\ud
Preliminary experiments have been conducted using the framework as an alternative\ud
to a path-planning problem solver in CanadarmTutor
Some characteristics of matroids through rough sets
At present, practical application and theoretical discussion of rough sets
are two hot problems in computer science. The core concepts of rough set theory
are upper and lower approximation operators based on equivalence relations.
Matroid, as a branch of mathematics, is a structure that generalizes linear
independence in vector spaces. Further, matroid theory borrows extensively from
the terminology of linear algebra and graph theory. We can combine rough set
theory with matroid theory through using rough sets to study some
characteristics of matroids. In this paper, we apply rough sets to matroids
through defining a family of sets which are constructed from the upper
approximation operator with respect to an equivalence relation. First, we prove
the family of sets satisfies the support set axioms of matroids, and then we
obtain a matroid. We say the matroids induced by the equivalence relation and a
type of matroid, namely support matroid, is induced. Second, through rough
sets, some characteristics of matroids such as independent sets, support sets,
bases, hyperplanes and closed sets are investigated.Comment: 13 page
Evaluation and optimization of frequent association rule based classification
Deriving useful and interesting rules from a data mining system is an essential and important task. Problems
such as the discovery of random and coincidental patterns or patterns with no significant values, and the
generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness
of rules generated by data mining algorithms are actively and constantly being examined and developed. In this
paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms,
combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task
Assessing and Remedying Coverage for a Given Dataset
Data analysis impacts virtually every aspect of our society today. Often,
this analysis is performed on an existing dataset, possibly collected through a
process that the data scientists had limited control over. The existing data
analyzed may not include the complete universe, but it is expected to cover the
diversity of items in the universe. Lack of adequate coverage in the dataset
can result in undesirable outcomes such as biased decisions and algorithmic
racism, as well as creating vulnerabilities such as opening up room for
adversarial attacks.
In this paper, we assess the coverage of a given dataset over multiple
categorical attributes. We first provide efficient techniques for traversing
the combinatorial explosion of value combinations to identify any regions of
attribute space not adequately covered by the data. Then, we determine the
least amount of additional data that must be obtained to resolve this lack of
adequate coverage. We confirm the value of our proposal through both
theoretical analyses and comprehensive experiments on real data.Comment: in ICDE 201
- …