929 research outputs found

    Spark solutions for discovering fuzzy association rules in Big Data

    Get PDF
    The research reported in this paper was partially supported the COPKIT project from the 8th Programme Framework (H2020) research and innovation programme (grant agreement No 786687) and from the BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947.The high computational impact when mining fuzzy association rules grows significantly when managing very large data sets, triggering in many cases a memory overflow error and leading to the experiment failure without its conclusion. It is in these cases when the application of Big Data techniques can help to achieve the experiment completion. Therefore, in this paper several Spark algorithms are proposed to handle with massive fuzzy data and discover interesting association rules. For that, we based on a decomposition of interestingness measures in terms of α-cuts, and we experimentally demonstrate that it is sufficient to consider only 10equidistributed α-cuts in order to mine all significant fuzzy association rules. Additionally, all the proposals are compared and analysed in terms of efficiency and speed up, in several datasets, including a real dataset comprised of sensor measurements from an office building.COPKIT project from the 8th Programme Framework (H2020) research and innovation programme 786687BIGDATAMED projects B-TIC-145-UGR18 P18-RT-294

    Unexpected rules using a conceptual distance based on fuzzy ontology

    Get PDF
    AbstractOne of the major drawbacks of data mining methods is that they generate a notably large number of rules that are often obvious or useless or, occasionally, out of the user’s interest. To address such drawbacks, we propose in this paper an approach that detects a set of unexpected rules in a discovered association rule set. Generally speaking, the proposed approach investigates the discovered association rules using the user’s domain knowledge, which is represented by a fuzzy domain ontology. Next, we rank the discovered rules according to the conceptual distances of the rules

    Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets

    Get PDF
    Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base

    MINING MULTIDIMENSIONAL FUZZY ASSOCIATION RULES FROM A DATABASE OF MEDICAL RECORD PATIENTS

    Get PDF
    Mining association rules is one of the important tasks in the process of data mining application. In general, the input as used in the process of generating rules is taken from a certain data table by which all the corresponding values of every domain data have correlations one to each others as given in the table. A problem arises when we need to generate the rules expressing the relationship between two or more domains that belong to several different tables in a normalized database. To overcome the problem, before generating rules it is necessary to join the participant tables into a general table by a process called Denormalization Process. This paper shows a process of generating Multidimensional Fuzzy Association Rules mining from a normalized database of medical record patients. The process consists of two sub-processes, namely sub-process of join tables (Denormalization Process) and sub-process of generating fuzzy rules. In general, the process of generating the fuzzy rules has been discussed in our previous papers [1, 2, 3, 4]. In addition to the process of generating fuzzy rules, this paper proposes a correlation measure of the rules as an additional consideration for evaluating interestingness of provided rules

    A logic approach for exceptions and anomalies in association rules

    Get PDF
    Association rules have been used for obtaining information hidden in a database. Recent researches have pointed out that simple associations are insu cient for representing the diverse kinds of knowledge collected in a database. The use of exceptions and anomalies deal with a di erent type of knowledge sometimes more useful than simple associations. Moreover ex- ceptions and anomalies provide a more comprehensive understanding of the information provided by a database. This work intends to go deeper in the logic model studied in [5]. In the model, association rules can be viewed as general relations between two or more attributes quanti ed by means of a convenient quanti er. Using this formulation we establish the true semantics of the distinct kinds of knowledge we can nd in the database hidden in the four folds of the contingency table. The model is also useful for providing some measures for assessing the validity of those kinds of rulesPeer Reviewe

    Data mining in soft computing framework: a survey

    Get PDF
    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included
    corecore