35,579 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis

    Get PDF
    Root Cause Analysis (RCA) is often used in manufacturing analysis to prevent the reoccurrence of undesired events. Association rule mining (ARM) was introduced in RCA to extract frequently occur patterns, interesting correlations, associations or casual structures among items in the database. However, frequent pattern mining (FPM) using Apriori-like algorithms and support-confidence framework suffers from the myth of rare item problem in nature. This has greatly reduced the performance of RCA, especially in manufacturing domain, where existence of imbalanced data is a norm in a production plant. In addition, exponential growth of data causes high computational costs in Apriori-like algorithms. Hence, this research aims to propose a two stage FPM, integrating Principal Component Analysis (PCA) and Weighted Apriori-T (PCA-WAT) algorithm to address these problems. PCA is used to generate item weight by considering maximally distributed covariance to normalise the effect of rare items. Using PCA, significant rare item will have a higher weight while less significant high occurance item will have a lower weight. On the other hand, Apriori-T with indexing enumeration tree is used for low cost FPM. A semiconductor manufacturing case study with Work In Progress data and true alarm data is used to proof the proposed algorithm. The proposed PCA-WAT algorithm is benchmarked with the Apriori and Apriori-T algorithms.Comparison analysis on weighted support has been performed to evaluate the capability of PCA in normalising item’s support value. The experimental results have proven that PCA is able to normalise the item support value and reduce the influence of imbalance data in FPM.Both quality and performance measure are used as performance measurement. The quality measures aim to compare the frequent itemsets and interesting rules generated across different support and confidence thresholds, ranging from 5% to 20%, and 10% to 90% respectively.The rules validation involves a business analyst from the related field. The domain expert has verified that the generated rules are able to explain the contributing factors towards failure analysis. However, significant rare rules are not easily discovered because the normalized weighted support values are generally lower compared to the original suppport values. The performance measures aim to compare the execution time in second (s) and the execution Random Access Memory (RAM) in megabyte (MB). The experiment results proven that the implementation of Apriori-T has lowered the computational cost by at least 90% of computation time and 35.33% of computation RAM as compared to Apriori. The primary contribution of this study is to propose a two-stage FPM to perform RCA in manufacturing domain with the existence of imbalanced dataset. In conclusion, the proposed algorithm is able to overcome the rare item issue by implementing covariance based support value normalization and high computational costs issue by implementing indexing enumeration tree structure.Future work of this study should focus on rule interpretation to generate more human understandable rule by novice in data mining. In addition, suitable support and confidence thresholds are needed after the normalisation process to better discover the significant rare itemset

    A Survey on Data Mining Algorithm for Market Basket Analysis

    Get PDF
    Association rule mining identifies the remarkable association or relationship between a large set of data items. With huge quantity of data constantly being obtained and stored in databases, several industries are becoming concerned in mining association rules from their databases. For example, the detection of interesting association relationships between large quantities of business transaction data can assist in catalog design, cross-marketing, lossleader analysis, and various business decision making processes. A typical example of association rule mining is market basket analysis. This method examines customer buying patterns by identifying associations among various items that customers place in their shopping baskets. The identification of such associations can assist retailers expand marketing strategies by gaining insight into which items are frequently purchased jointly by customers. It is helpful to examine the customer purchasing behavior and assists in increasing the sales and conserve inventory by focusing on the point of sale transaction data. This work acts as a broad area for the researchers to develop a better data mining algorithm. This paper presents a survey about the existing data mining algorithm for market basket analysis

    A Process to Implement an Artificial Neural Network and Association Rules Techniques to Improve Asset Performance and Energy Efficiency

    Get PDF
    In this paper, we address the problem of asset performance monitoring, with the intention of both detecting any potential reliability problem and predicting any loss of energy consumption e ciency. This is an important concern for many industries and utilities with very intensive capitalization in very long-lasting assets. To overcome this problem, in this paper we propose an approach to combine an Artificial Neural Network (ANN) with Data Mining (DM) tools, specifically with Association Rule (AR) Mining. The combination of these two techniques can now be done using software which can handle large volumes of data (big data), but the process still needs to ensure that the required amount of data will be available during the assets’ life cycle and that its quality is acceptable. The combination of these two techniques in the proposed sequence di ers from previous works found in the literature, giving researchers new options to face the problem. Practical implementation of the proposed approach may lead to novel predictive maintenance models (emerging predictive analytics) that may detect with unprecedented precision any asset’s lack of performance and help manage assets’ O&M accordingly. The approach is illustrated using specific examples where asset performance monitoring is rather complex under normal operational conditions.Ministerio de Economía y Competitividad DPI2015-70842-
    corecore