Search CORE

929 research outputs found

Spark solutions for discovering fuzzy association rules in Big Data

Author: Fernández Basso Carlos Jesús
Martín Bautista María José
Ruiz Jiménez María Dolores
Publication venue: 'Elsevier BV'
Publication date: 24/07/2021
Field of study

The research reported in this paper was partially supported the COPKIT project from the 8th Programme Framework (H2020) research and innovation programme (grant agreement No 786687) and from the BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947.The high computational impact when mining fuzzy association rules grows significantly when managing very large data sets, triggering in many cases a memory overflow error and leading to the experiment failure without its conclusion. It is in these cases when the application of Big Data techniques can help to achieve the experiment completion. Therefore, in this paper several Spark algorithms are proposed to handle with massive fuzzy data and discover interesting association rules. For that, we based on a decomposition of interestingness measures in terms of α-cuts, and we experimentally demonstrate that it is sufficient to consider only 10equidistributed α-cuts in order to mine all significant fuzzy association rules. Additionally, all the proposals are compared and analysed in terms of efficiency and speed up, in several datasets, including a real dataset comprised of sensor measurements from an office building.COPKIT project from the 8th Programme Framework (H2020) research and innovation programme 786687BIGDATAMED projects B-TIC-145-UGR18 P18-RT-294

Repositorio Institucional Universidad de Granada

Unexpected rules using a conceptual distance based on fuzzy ontology

Author: Hamani Mohamed Said
Kissoum Yacine
Maamri Ramdane
Sedrati Maamar
Publication venue: Production and hosting by Elsevier B.V.
Publication date: 01/01/2014
Field of study

AbstractOne of the major drawbacks of data mining methods is that they generate a notably large number of rules that are often obvious or useless or, occasionally, out of the user’s interest. To address such drawbacks, we propose in this paper an approach that detects a set of unexpected rules in a discovered association rule set. Generally speaking, the proposed approach investigates the discovered association rules using the user’s domain knowledge, which is represented by a fuzzy domain ontology. Next, we rank the discovered rules according to the conceptual distances of the rules

Elsevier - Publisher Connector

Directory of Open Access Journals

Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets

Author: Jabez J. Christopher
Kannan Arputharaj
Khanna H. Nehemiah
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2016
Field of study

Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base

Crossref

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

MINING MULTIDIMENSIONAL FUZZY ASSOCIATION RULES FROM A DATABASE OF MEDICAL RECORD PATIENTS

Author: Handojo Andreas
Intan Rolly
Yuliana Oviliani Yenty
Publication venue: 'Petra Christian University'
Publication date: 01/01/2008
Field of study

Mining association rules is one of the important tasks in the process of data mining application. In general, the input as used in the process of generating rules is taken from a certain data table by which all the corresponding values of every domain data have correlations one to each others as given in the table. A problem arises when we need to generate the rules expressing the relationship between two or more domains that belong to several different tables in a normalized database. To overcome the problem, before generating rules it is necessary to join the participant tables into a general table by a process called Denormalization Process. This paper shows a process of generating Multidimensional Fuzzy Association Rules mining from a normalized database of medical record patients. The process consists of two sub-processes, namely sub-process of join tables (Denormalization Process) and sub-process of generating fuzzy rules. In general, the process of generating the fuzzy rules has been discussed in our previous papers [1, 2, 3, 4]. In addition to the process of generating fuzzy rules, this paper proposes a correlation measure of the rules as an additional consideration for evaluating interestingness of provided rules

Neliti

Crossref

Jurnal Informatika

Directory of Open Access Journals

A logic approach for exceptions and anomalies in association rules

Author: Delgado M.
Ruiz M.D.
Sánchez Daniel
Publication venue: Universitat Politècnica de Catalunya. Secció de Matemàtiques i Informàtica
Publication date: 01/01/2008
Field of study

Association rules have been used for obtaining information hidden in a database. Recent researches have pointed out that simple associations are insu cient for representing the diverse kinds of knowledge collected in a database. The use of exceptions and anomalies deal with a di erent type of knowledge sometimes more useful than simple associations. Moreover ex- ceptions and anomalies provide a more comprehensive understanding of the information provided by a database. This work intends to go deeper in the logic model studied in [5]. In the model, association rules can be viewed as general relations between two or more attributes quanti ed by means of a convenient quanti er. Using this formulation we establish the true semantics of the distinct kinds of knowledge we can nd in the database hidden in the four folds of the contingency table. The model is also useful for providing some measures for assessing the validity of those kinds of rulesPeer Reviewe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Data mining in soft computing framework: a survey

Author: Mitra P.
Mitra S.
Pal S. K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included