108,414 research outputs found

    Incremental algorithm for association rule mining under dynamic threshold

    Get PDF
    Data mining is essentially applied to discover new knowledge from a database through an iterative process. The mining process may be time consuming for massive datasets. A widely used method related to knowledge discovery domain refers to association rule mining (ARM) approach, despite its shortcomings in mining large databases. As such, several approaches have been prescribed to unravel knowledge. Most of the proposed algorithms addressed data incremental issues, especially when a hefty amount of data are added to the database after the latest mining process. Three basic manipulation operations performed in a database include add, delete, and update. Any method devised in light of data incremental issues is bound to embed these three operations. The changing threshold is a long-standing problem within the data mining field. Since decision making refers to an active process, the threshold is indeed changeable. Accordingly, the present study proposes an algorithm that resolves the issue of rescanning a database that had been mined previously and allows retrieval of knowledge that satisfies several thresholds without the need to learn the process from scratch. The proposed approach displayed high accuracy in experimentation, as well as reduction in processing time by almost two-thirds of the original mining execution time

    Incremental algorithm for association rule mining under dynamic threshold

    Get PDF
    © 2019 The Authors. Published by MDPI AG. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.3390/app9245398Data mining is essentially applied to discover new knowledge from a database through an iterative process. The mining process may be time consuming for massive datasets. A widely used method related to knowledge discovery domain refers to association rule mining (ARM) approach, despite its shortcomings in mining large databases. As such, several approaches have been prescribed to unravel knowledge. Most of the proposed algorithms addressed data incremental issues, especially when a hefty amount of data are added to the database after the latest mining process. Three basic manipulation operations performed in a database include add, delete, and update. Any method devised in light of data incremental issues is bound to embed these three operations. The changing threshold is a long-standing problem within the data mining field. Since decision making refers to an active process, the threshold is indeed changeable. Accordingly, the present study proposes an algorithm that resolves the issue of rescanning a database that had been mined previously and allows retrieval of knowledge that satisfies several thresholds without the need to learn the process from scratch. The proposed approach displayed high accuracy in experimentation, as well as reduction in processing time by almost two-thirds of the original mining execution time.This research was funded by University Malaya through a postgraduate research grant (PPP) grant number PG106-2015B.Published onlin

    Randomized Response Technique in Data Mining

    Get PDF
    Data mining is a process in which data is collected from different sources and resume it in useful information. Data mining is also known as knowledge discovery in database (KDD).Privacy and accuracy are the important issues in data mining when data is shared. A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Most of the methods use random permutation techniques to mask the data, for preserving the privacy of sensitive data. Randomize response techniques were developed for the purpose of protecting surveys privacy and avoiding answers bias mainly. In RR technique it adds certain degree of randomness to the answer to prevent the data. The objective of this thesis is to enhance the privacy level in RR technique using four group schemes. First according to the algorithm random attributes a, b, c, d were considered, Then the randomization have been performed on every dataset according to the values of theta. Then ID3 and CART algorithm was applied on the randomized data. The result shows that by increasing the group, the privacy level will increase

    Randomized Response Technique in Data Mining

    Get PDF
    Data mining is a process in which data is collected from different sources and resume it in useful information. Data mining is also known as knowledge discovery in database (KDD). Privacy and accuracy are the important issues in data mining when data is shared. A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Most of the methods use random permutation techniques to mask the data, for preserving the privacy of sensitive data. Randomize response techniques were developed for the purpose of protecting surveys privacy and avoiding answers bias mainly. In RR technique it adds certain degree of randomness to the answer to prevent the data. The objective of this thesis is t o enhance the privacy level in RR technique using four group schemes. First according to the algorithm random attributes a, b, c, d wer e considered, Then the randomization have been performed on every dataset according to the values of theta. Then ID3 and CART algorithm was applied on the randomized data. The result shows that by increasing the group, the privacy level will increase

    Randomized Response Technique in Data Mining

    Get PDF
    Data mining is a process in which data is collected from different sources and resume it in useful information. Data mining is also known as knowledge discovery in database (KDD).Privacy and accuracy are the important issues in data mining when data is shared. A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Most of the methods use random permutation techniques to mask the data, for preserving the privacy of sensitive data. Randomize response techniques were developed for the purpose of protecting surveys privacy and avoiding answers bias mainly. In RR technique it adds certain degree of randomness to the answer to prevent the data. The objective of this thesis is to enhance the privacy level in RR technique using four group schemes. First according to the algorithm random attributes a, b, c, d were considered, Then the randomization have been performed on every dataset according to the values of theta. Then ID3 and CART algorithm was applied on the randomized data. The result shows that by increasing the group, the privacy level will increase

    Adopting Data Mining as a Knowledge Discovery Tool: The Influential Factors from the Perspectives of Information Systems Managers

    Get PDF
    Data mining is the process of discovering patterns from large sets of data, based on methods at the intersection of machine learning, statistics, and database systems. As a form of knowledge discovery, the process uncovers concealed patterns to forecast possible results. To meet this objective, this study has applied a cross-sectional quantitative research approach. The data was gathered from managers in the fields of Information Technology (IT) and information systems (IS) of large companies operating in different e-commerce, digital businesses, and marketing in Jordan. The data was then gathered and analyzed. With a total of 309 responses collected in this study, the results were reached using structural equation modeling via Analysis of Moments Structure (AMOS V.21). The proposed conceptual model confirmed that all the identified variables associated with positive coefficients of data mining adoption with data warehouse, data accuracy, perceived usefulness, perceived ease of use, as well as Information System performance. Moreover, the study concluded with research insights related to this topic with further suggested research directed to expand the grasp in this field, and provide deeper understanding of the data mining related issues

    A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

    Get PDF
    This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization
    • …
    corecore