19 research outputs found

    Analysis of monotonicity properties of some rule interestingness measures

    Get PDF
    One of the crucial problems in the field of knowledge discovery is development of good interestingness measures for evaluation of the discovered patterns. In this paper, we consider quantitative, objective interestingness measures for "if..., then... " association rules. We focus on three popular interestingness measures, namely rule interest function of Piatetsky-Shapiro, gain measure of Fukuda et al., and dependency factor used by Pawlak. We verify whether they satisfy the valuable property M of monotonic dependency on the number of objects satisfying or not the premise or the conclusion of a rule, and property of hypothesis symmetry (HS). Moreover, analytically and through experiments we show an interesting relationship between those measures and two other commonly used measures of rule support and anti-support

    Modèle d'espaces de communautés basé sur la théorie des ensembles d'approximation dans un système de filtrage hybride

    No full text
    International audienceLes systèmes de filtrage ont pour but de distribuer des informations de façon personnalisée aux utilisateurs, tout en s'adaptant en permanence au besoin en information de chacun. Dans un système de filtrage hybride s'appuyant sur le filtrage collaboratif, la production de recommandations se base sur des communautés d'utilisateurs qui sont généralement formées conformément au seul critère de proximité des évaluations des utilisateurs sur les recommandations reçues dans le passé. De plus ces communautés restent généralement implicites. Nous proposons un modèle d'espaces de communautés multicritères et explicites, et des mesures se basant sur la théorie des ensembles d'approximation pour analyser la dépendance entre les critères de formation des communautés. Le modèle d'espaces de communautés permet de diversifier les recommandations qui peuvent émaner de communautés variées. Les mesures permettent de comparer des critères entre eux afin de déterminer une priorité entre les critères dans la tâche d'amélioration du positionnement des utilisateurs dans les communautés

    Modeling mode choice behavior incorporating household and individual sociodemographics and travel attributes based on rough sets theory

    Get PDF
    Most traditional mode choice models are based on the principle of random utility maximization derived from econometric theory. Alternatively, mode choice modeling can be regarded as a pattern recognition problem reflected from the explanatory variables of determining the choices between alternatives. The paper applies the knowledge discovery technique of rough sets theory to model travel mode choices incorporating household and individual sociodemographics and travel information, and to identify the significance of each attribute. The study uses the detailed travel diary survey data of Changxing county which contains information on both household and individual travel behaviors for model estimation and evaluation. The knowledge is presented in the form of easily understood IF-THEN statements or rules which reveal how each attribute influences mode choice behavior. These rules are then used to predict travel mode choices from information held about previously unseen individuals and the classification performance is assessed. The rough sets model shows high robustness and good predictive ability. The most significant condition attributes identified to determine travel mode choices are gender, distance, household annual income, and occupation. Comparative evaluation with the MNL model also proves that the rough sets model gives superior prediction accuracy and coverage on travel mode choice modeling

    Gene hunting of the Genetic Analysis Workshop 16 rheumatoid arthritis data using rough set theory

    Get PDF
    We propose to use the rough set theory to identify genes affecting rheumatoid arthritis risk from the data collected by the North American Rheumatoid Arthritis Consortium. For each gene, we employ generalized dynamic reducts in the rough set theory to select a subset of single-nucleotide polymorphisms (SNPs) to represent the genetic information from this gene. We then group the study subjects into different clusters based on their genotype similarity at the selected markers. Statistical association between disease status and cluster membership is then studied to identify genes associated with rheumatoid arthritis. Based on our proposed approach, we are able to identify a number of statistically significant genes associated with rheumatoid arthritis. Aside from genes on chromosome 6, our identified genes include known disease-associated genes such as PTPN22 and TRAF1. In addition, our list contains other biologically plausible genes, such as ADAM15 and AGPAT2. Our findings suggest that ADAM15 and AGPAT2 may contribute to a genetic predisposition through abnormal angiogenesis and adipose tissue

    Division Charts as Granules and Their Merging Algorithm for Rule Generation in Nondeterministic Data

    Get PDF
    We have been proposing a framework rough Nondeterministic information analysis, which considers granular computing concepts in tables with incomplete and nondeterministic information, as well as rule generation. We have recently defined an expression named division chart with respect to an implication and a subset of objects. Each division chart takes the role of the minimum granule for rule generation, and it takes the role of contingency table in statistics. In this paper, we at first define a division chart in deterministic information systems (DISs) and clarify the relation between a division chart and a corresponding implication. We also consider a merging algorithm for two division charts and extend the relation in DISs to nondeterministic information systems. The relation gives us the foundations of rule generation in tables with nondeterministic information

    An Efficient Classification Model using Fuzzy Rough Set Theory and Random Weight Neural Network

    Get PDF
    In the area of fuzzy rough set theory (FRST), researchers have gained much interest in handling the high-dimensional data. Rough set theory (RST) is one of the important tools used to pre-process the data and helps to obtain a better predictive model, but in RST, the process of discretization may loss useful information. Therefore, fuzzy rough set theory contributes well with the real-valued data. In this paper, an efficient technique is presented based on Fuzzy rough set theory (FRST) to pre-process the large-scale data sets to increase the efficacy of the predictive model. Therefore, a fuzzy rough set-based feature selection (FRSFS) technique is associated with a Random weight neural network (RWNN) classifier to obtain the better generalization ability. Results on different dataset show that the proposed technique performs well and provides better speed and accuracy when compared by associating FRSFS with other machine learning classifiers (i.e., KNN, Naive Bayes, SVM, decision tree and backpropagation neural network)

    Rough sets approach to symbolic value partition

    Get PDF
    AbstractIn data mining, searching for simple representations of knowledge is a very important issue. Attribute reduction, continuous attribute discretization and symbolic value partition are three preprocessing techniques which are used in this regard. This paper investigates the symbolic value partition technique, which divides each attribute domain of a data table into a family for disjoint subsets, and constructs a new data table with fewer attributes and smaller attribute domains. Specifically, we investigates the optimal symbolic value partition (OSVP) problem of supervised data, where the optimal metric is defined by the cardinality sum of new attribute domains. We propose the concept of partition reducts for this problem. An optimal partition reduct is the solution to the OSVP-problem. We develop a greedy algorithm to search for a suboptimal partition reduct, and analyze major properties of the proposed algorithm. Empirical studies on various datasets from the UCI library show that our algorithm effectively reduces the size of attribute domains. Furthermore, it assists in computing smaller rule sets with better coverage compared with the attribute reduction approach

    Rough sets, their extensions and applications

    Get PDF
    Rough set theory provides a useful mathematical foundation for developing automated computational systems that can help understand and make use of imperfect knowledge. Despite its recency, the theory and its extensions have been widely applied to many problems, including decision analysis, data-mining, intelligent control and pattern recognition. This paper presents an outline of the basic concepts of rough sets and their major extensions, covering variable precision, tolerance and fuzzy rough sets. It also shows the diversity of successful applications these theories have entailed, ranging from financial and business, through biological and medicine, to physical, art, and meteorological
    corecore