Search CORE

18,470 research outputs found

Characteristic of partition-circuit matroid through approximation number

Author: Liu Yanfang
Zhu William
Publication venue
Publication date: 23/10/2012
Field of study

Rough set theory is a useful tool to deal with uncertain, granular and incomplete knowledge in information systems. And it is based on equivalence relations or partitions. Matroid theory is a structure that generalizes linear independence in vector spaces, and has a variety of applications in many fields. In this paper, we propose a new type of matroids, namely, partition-circuit matroids, which are induced by partitions. Firstly, a partition satisfies circuit axioms in matroid theory, then it can induce a matroid which is called a partition-circuit matroid. A partition and an equivalence relation on the same universe are one-to-one corresponding, then some characteristics of partition-circuit matroids are studied through rough sets. Secondly, similar to the upper approximation number which is proposed by Wang and Zhu, we define the lower approximation number. Some characteristics of partition-circuit matroids and the dual matroids of them are investigated through the lower approximation number and the upper approximation number.Comment: 12 page

arXiv.org e-Print Archive

Crossref

A Model-Based Frequency Constraint for Mining Associations from Transaction Data

Author: Hahsler Michael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with the model to find local frequency thresholds for groups of itemsets. Based on the constraint we develop the notion of NB-frequent itemsets and adapt a mining algorithm to find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user

arXiv.org e-Print Archive

CiteSeerX

Problem-Solving Knowledge Mining from Users’\ud Actions in an Intelligent Tutoring System

Author: Couturier Olivier
Fournier-Viger Philippe
Mephu Engelbert
Nkambou Roger
Publication venue: Springer-Verlag
Publication date: 01/05/2007
Field of study

In an intelligent tutoring system (ITS), the domain expert should provide\ud relevant domain knowledge to the tutor so that it will be able to guide the\ud learner during problem solving. However, in several domains, this knowledge is\ud not predetermined and should be captured or learned from expert users as well as\ud intermediate and novice users. Our hypothesis is that, knowledge discovery (KD)\ud techniques can help to build this domain intelligence in ITS. This paper proposes\ud a framework to capture problem-solving knowledge using a promising approach\ud of data and knowledge discovery based on a combination of sequential pattern\ud mining and association rules discovery techniques. The framework has been implemented\ud and is used to discover new meta knowledge and rules in a given domain\ud which then extend domain knowledge and serve as problem space allowing\ud the intelligent tutoring system to guide learners in problem-solving situations.\ud Preliminary experiments have been conducted using the framework as an alternative\ud to a path-planning problem solver in CanadarmTutor

Archipel - Université du Québec à Montréal

Some characteristics of matroids through rough sets

Author: Su Lirun
Zhu William
Publication venue
Publication date: 24/09/2012
Field of study

At present, practical application and theoretical discussion of rough sets are two hot problems in computer science. The core concepts of rough set theory are upper and lower approximation operators based on equivalence relations. Matroid, as a branch of mathematics, is a structure that generalizes linear independence in vector spaces. Further, matroid theory borrows extensively from the terminology of linear algebra and graph theory. We can combine rough set theory with matroid theory through using rough sets to study some characteristics of matroids. In this paper, we apply rough sets to matroids through defining a family of sets which are constructed from the upper approximation operator with respect to an equivalence relation. First, we prove the family of sets satisfies the support set axioms of matroids, and then we obtain a matroid. We say the matroids induced by the equivalence relation and a type of matroid, namely support matroid, is induced. Second, through rough sets, some characteristics of matroids such as independent sets, support sets, bases, hyperplanes and closed sets are investigated.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Evaluation and optimization of frequent association rule based classification

Author: Izwan Nizal Mohd Shaharanee
Jastini Jamil
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2014
Field of study

Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task

UKM Journal Article Repository

Assessing and Remedying Coverage for a Given Dataset

Author: Asudeh Abolfazl
Jagadish H. V.
Jin Zhongjun
Publication venue
Publication date: 23/02/2019
Field of study

Data analysis impacts virtually every aspect of our society today. Often, this analysis is performed on an existing dataset, possibly collected through a process that the data scientists had limited control over. The existing data analyzed may not include the complete universe, but it is expected to cover the diversity of items in the universe. Lack of adequate coverage in the dataset can result in undesirable outcomes such as biased decisions and algorithmic racism, as well as creating vulnerabilities such as opening up room for adversarial attacks. In this paper, we assess the coverage of a given dataset over multiple categorical attributes. We first provide efficient techniques for traversing the combinatorial explosion of value combinations to identify any regions of attribute space not adequately covered by the data. Then, we determine the least amount of additional data that must be obtained to resolve this lack of adequate coverage. We confirm the value of our proposal through both theoretical analyses and comprehensive experiments on real data.Comment: in ICDE 201

arXiv.org e-Print Archive