4,644 research outputs found

    Encapsulation of Soft Computing Approaches within Itemset Mining a A Survey

    Get PDF
    Data Mining discovers patterns and trends by extracting knowledge from large databases. Soft Computing techniques such as fuzzy logic, neural networks, genetic algorithms, rough sets, etc. aims to reveal the tolerance for imprecision and uncertainty for achieving tractability, robustness and low-cost solutions. Fuzzy Logic and Rough sets are suitable for handling different types of uncertainty. Neural networks provide good learning and generalization. Genetic algorithms provide efficient search algorithms for selecting a model, from mixed media data. Data mining refers to information extraction while soft computing is used for information processing. For effective knowledge discovery from large databases, both Soft Computing and Data Mining can be merged. Association rule mining (ARM) and Itemset mining focus on finding most frequent item sets and corresponding association rules, extracting rare itemsets including temporal and fuzzy concepts in discovered patterns. This survey paper explores the usage of soft computing approaches in itemset utility mining

    A Model-Based Frequency Constraint for Mining Associations from Transaction Data

    Full text link
    Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with the model to find local frequency thresholds for groups of itemsets. Based on the constraint we develop the notion of NB-frequent itemsets and adapt a mining algorithm to find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Knowledge Discovery in Databases: An Information Retrieval Perspective

    Get PDF
    The current trend of increasing capabilities in data generation and collection has resulted in an urgent need for data mining applications, also called knowledge discovery in databases. This paper identifies and examines the issues involved in extracting useful grains of knowledge from large amounts of data. It describes a framework to categorise data mining systems. The author also gives an overview of the issues pertaining to data pre processing, as well as various information gathering methodologies and techniques. The paper covers some popular tools such as classification, clustering, and generalisation. A summary of statistical and machine learning techniques used currently is also provided

    Mining Time-delayed Gene Regulation Patterns from Gene Expression Data

    Get PDF
    Discovered gene regulation networks are very helpful to predict unknown gene functions. The activating and deactivating relations between genes and genes are mined from microarray gene expression data. There are evidences showing that multiple time units delay exist in a gene regulation process. Association rule mining technique is very suitable for finding regulation relations among genes. However, current association rule mining techniques cannot handle temporally ordered transactions. We propose a modified association rule mining technique for efficiently discovering time-delayed regulation relationships among genes.By analyzing gene expression data, we can discover gene relations. Thus, we use modified association rule to mine gene regulation patterns. Our proposed method, BC3, is designed to mine time-delayed gene regulation patterns with length 3 from time series gene expression data. However, the front two items are regulators, and the last item is their affecting target. First we use Apriori to find frequent 2-itemset in order to figure backward to BL1. The Apriori mined the frequent 2-itemset in the same time point, so we make the L2 split to length one for having relation in the same time point. Then we combine BL1 with L1 to a new ordered-set BC2 with time-delayed relations. After pruning BC2 with the threshold, BL2 is derived. The results are worked out by BL2 joining itself to BC3, and sifting BL3 from BC3. We use yeast gene expression data to evaluate our method and analyze the results to show our work is efficient

    Information fusion from multiple databases using meta-association rules

    Get PDF
    Nowadays, data volume, distribution, and volatility make it difficult to search global patterns by applying traditional Data Mining techniques. In the case of data in a distributed environment, sometimes a local analysis of each dataset separately is adequate but some other times a global decision is needed by the analysis of the entire data. Association rules discovering methods typically require a single uniform dataset and managing with the entire set of distributed data is not possible due to its size. To address the scenarios in which satisfying this requirement is not practical or even feasible, we propose a new method for fusing information, in the form of rules, extracted from multiple datasets. The proposed model produces meta-association rules, i.e. rules in which the antecedent or the consequent may contain rules as well, for finding joint correlations among trends found individually in each dataset. In this paper, we describe the formulation and the implementation of two alternative frameworks that obtain, respectively, crisp meta-rules and fuzzy meta-rules. We compare our proposal with the information obtained when the datasets are not separated, in order to see the main differences between traditional association rules and meta-association rules. We also compare crisp and fuzzy methods for meta-association rule mining, observing that the fuzzy approach offers several advantages: it is more accurate since it incorporates the strength or validity of the previous information, produces a more manageable set of rules for human inspection, and allows the incorporation of contextual information to the mining process expressed in a more human-friendly format

    A comparative study of the AHP and TOPSIS methods for implementing load shedding scheme in a pulp mill system

    Get PDF
    The advancement of technology had encouraged mankind to design and create useful equipment and devices. These equipment enable users to fully utilize them in various applications. Pulp mill is one of the heavy industries that consumes large amount of electricity in its production. Due to this, any malfunction of the equipment might cause mass losses to the company. In particular, the breakdown of the generator would cause other generators to be overloaded. In the meantime, the subsequence loads will be shed until the generators are sufficient to provide the power to other loads. Once the fault had been fixed, the load shedding scheme can be deactivated. Thus, load shedding scheme is the best way in handling such condition. Selected load will be shed under this scheme in order to protect the generators from being damaged. Multi Criteria Decision Making (MCDM) can be applied in determination of the load shedding scheme in the electric power system. In this thesis two methods which are Analytic Hierarchy Process (AHP) and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) were introduced and applied. From this thesis, a series of analyses are conducted and the results are determined. Among these two methods which are AHP and TOPSIS, the results shown that TOPSIS is the best Multi criteria Decision Making (MCDM) for load shedding scheme in the pulp mill system. TOPSIS is the most effective solution because of the highest percentage effectiveness of load shedding between these two methods. The results of the AHP and TOPSIS analysis to the pulp mill system are very promising

    Data Mining Based on Semantic Similarity to mine new Association Rules

    Get PDF
    The problem of mining association rules in a database are introduced. Most of association rule mining approaches aim to mine association rules considering exact matches between items in transactions. A new algorithm called 201C;Improved Data Mining Based on Semantic Similarity to mine new Association Rules201D; which considers not only exact matches between items, but also the semantic similarity between them. Improved Data Mining (IDM) Based on Semantic Similarity to mine new Association Rules uses the concepts of an expert to represent the similarity degree between items, and proposes a new way of obtaining support and confidence for the association rules containing these items. An association rule is for ex: i.e. for a grocery store say 201C;30% of transactions that contain bread also contain milk; 2% of all transactions contain both of these items201D;. Here 30% is called the confidence of the rule, and 2% the support of the rule and this rule is represented as Bread F0E0; Milk. The problem is to find all association rules that satisfy user-specified minimum support and minimum confidence constraints. This paper then results that new rules bring more information about the database

    A case study of predicting banking customers behaviour by using data mining

    Get PDF
    Data Mining (DM) is a technique that examines information stored in large database or data warehouse and find the patterns or trends in the data that are not yet known or suspected. DM techniques have been applied to a variety of different domains including Customer Relationship Management CRM). In this research, a new Customer Knowledge Management (CKM) framework based on data mining is proposed. The proposed data mining framework in this study manages relationships between banking organizations and their customers. Two typical data mining techniques - Neural Network and Association Rules - are applied to predict the behavior of customers and to increase the decision-making processes for recalling valued customers in banking industries. The experiments on the real world dataset are conducted and the different metrics are used to evaluate the performances of the two data mining models. The results indicate that the Neural Network model achieves better accuracy but takes longer time to train the model

    Preparation and characterization of magnetite (Fe3O4) nanoparticles By Sol-Gel method

    Get PDF
    The magnetite (Fe3O4) nanoparticles were successfully synthesized and annealed under vacuum at different temperature. The Fe3O4 nanoparticles prepared via sol-gel assisted method and annealed at 200-400ºC were characterized by Fourier Transformation Infrared Spectroscopy (FTIR), X-ray Diffraction spectra (XRD), Field Emission Scanning Electron Microscope (FESEM) and Atomic Force Microscopy (AFM). The XRD result indicate the presence of Fe3O4 nanoparticles, and the Scherer`s Formula calculated the mean particles size in range of 2-25 nm. The FESEM result shows that the morphologies of the particles annealed at 400ºC are more spherical and partially agglomerated, while the EDS result indicates the presence of Fe3O4 by showing Fe-O group of elements. AFM analyzed the 3D and roughness of the sample; the Fe3O4 nanoparticles have a minimum diameter of 79.04 nm, which is in agreement with FESEM result. In many cases, the synthesis of Fe3O4 nanoparticles using FeCl3 and FeCl2 has not been achieved, according to some literatures, but this research was able to obtained Fe3O4 nanoparticles base on the characterization results
    • …
    corecore