18,470 research outputs found

    Characteristic of partition-circuit matroid through approximation number

    Full text link
    Rough set theory is a useful tool to deal with uncertain, granular and incomplete knowledge in information systems. And it is based on equivalence relations or partitions. Matroid theory is a structure that generalizes linear independence in vector spaces, and has a variety of applications in many fields. In this paper, we propose a new type of matroids, namely, partition-circuit matroids, which are induced by partitions. Firstly, a partition satisfies circuit axioms in matroid theory, then it can induce a matroid which is called a partition-circuit matroid. A partition and an equivalence relation on the same universe are one-to-one corresponding, then some characteristics of partition-circuit matroids are studied through rough sets. Secondly, similar to the upper approximation number which is proposed by Wang and Zhu, we define the lower approximation number. Some characteristics of partition-circuit matroids and the dual matroids of them are investigated through the lower approximation number and the upper approximation number.Comment: 12 page

    A Model-Based Frequency Constraint for Mining Associations from Transaction Data

    Full text link
    Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with the model to find local frequency thresholds for groups of itemsets. Based on the constraint we develop the notion of NB-frequent itemsets and adapt a mining algorithm to find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user

    Problem-Solving Knowledge Mining from Users’\ud Actions in an Intelligent Tutoring System

    Get PDF
    In an intelligent tutoring system (ITS), the domain expert should provide\ud relevant domain knowledge to the tutor so that it will be able to guide the\ud learner during problem solving. However, in several domains, this knowledge is\ud not predetermined and should be captured or learned from expert users as well as\ud intermediate and novice users. Our hypothesis is that, knowledge discovery (KD)\ud techniques can help to build this domain intelligence in ITS. This paper proposes\ud a framework to capture problem-solving knowledge using a promising approach\ud of data and knowledge discovery based on a combination of sequential pattern\ud mining and association rules discovery techniques. The framework has been implemented\ud and is used to discover new meta knowledge and rules in a given domain\ud which then extend domain knowledge and serve as problem space allowing\ud the intelligent tutoring system to guide learners in problem-solving situations.\ud Preliminary experiments have been conducted using the framework as an alternative\ud to a path-planning problem solver in CanadarmTutor

    Some characteristics of matroids through rough sets

    Full text link
    At present, practical application and theoretical discussion of rough sets are two hot problems in computer science. The core concepts of rough set theory are upper and lower approximation operators based on equivalence relations. Matroid, as a branch of mathematics, is a structure that generalizes linear independence in vector spaces. Further, matroid theory borrows extensively from the terminology of linear algebra and graph theory. We can combine rough set theory with matroid theory through using rough sets to study some characteristics of matroids. In this paper, we apply rough sets to matroids through defining a family of sets which are constructed from the upper approximation operator with respect to an equivalence relation. First, we prove the family of sets satisfies the support set axioms of matroids, and then we obtain a matroid. We say the matroids induced by the equivalence relation and a type of matroid, namely support matroid, is induced. Second, through rough sets, some characteristics of matroids such as independent sets, support sets, bases, hyperplanes and closed sets are investigated.Comment: 13 page

    Evaluation and optimization of frequent association rule based classification

    Get PDF
    Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task

    Assessing and Remedying Coverage for a Given Dataset

    Full text link
    Data analysis impacts virtually every aspect of our society today. Often, this analysis is performed on an existing dataset, possibly collected through a process that the data scientists had limited control over. The existing data analyzed may not include the complete universe, but it is expected to cover the diversity of items in the universe. Lack of adequate coverage in the dataset can result in undesirable outcomes such as biased decisions and algorithmic racism, as well as creating vulnerabilities such as opening up room for adversarial attacks. In this paper, we assess the coverage of a given dataset over multiple categorical attributes. We first provide efficient techniques for traversing the combinatorial explosion of value combinations to identify any regions of attribute space not adequately covered by the data. Then, we determine the least amount of additional data that must be obtained to resolve this lack of adequate coverage. We confirm the value of our proposal through both theoretical analyses and comprehensive experiments on real data.Comment: in ICDE 201
    • …
    corecore