8 research outputs found

    Synthesizing high-frequency rules from different data sources

    Full text link

    Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy

    No full text
    Association rule mining recently attracted strong attention. Usually, the classification hierarchy over the data items is available. Users are interested in generalized association rules that span different levels of the hierarchy, since sometimes more interesting rules can be derived by taking the hierarchy into account. In this paper, we propose the new parallel algorithms for mining association rules with classification hierarchy on a shared-nothing parallel machine to improve its performance. Our algorithms partition the candidate itemsets over the processors, which exploits the aggregate memory of the system effectively. If the candidate itemsets are partitioned without considering classification hierarchy, both the items and its all the ancestor items have to be transmitted, that causes prohibitively large amount of communications. Our method minimizes interprocessor communication by considering the hierarchy. Moreover, in our algorithm, the available memory space is fully utili..

    Parallel Mining of Association Rules Using a Lattice Based Approach

    Get PDF
    The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to processors for processing and identification of frequent itemsets. Experimental results show that DDRM utilizes the processors efficiently and performed better than the prefix-based and partition algorithms that use a static approach to assign classes to the processors. The DDRM algorithm scales well and shows good speedup

    Dynamic causal mining

    Get PDF
    Causality plays a central role in human reasoning, in particular, in common human decision-making, by providing a basis for strategy selection. The main aim of the research reported in this thesis is to develop a new way to identify dynamic causal relationships between attributes of a system. The first part of the thesis introduces the development of a new data mining algorithm, called Dynamic Causal Mining (DCM), which extracts rules from data sets based on simultaneous time stamps. The rules derived can be combined into policies, which can simulate the future behaviour of systems. New rules can be added to the policies depending on the degree of accuracy. In addition, facilities to process categorical or numerical attributes directly and approaches to prune the rule set efficiently are implemented in the DCM algorithm. The second part of the thesis discusses how to improve the DCM algorithm in order to identify delay and feedback relationships. Fuzzy logic is applied to manage the rules and policies flexibly and accurately during the learning process and help the algorithm to find feasible solutions. The third part of the thesis describes the application of the suggested algorithm to a problem in the game-theoretic domain. This part concludes with the suggestion to use concept lattices as a method to represent and structure the discovered knowledge

    Tree algorithms for mining association rules

    Get PDF
    With the increasing reliability of digital communication, the falling cost of hardware and increased computational power, the gathering and storage of data has become easier than at any other time in history. Commercial and public agencies are able to hold extensive records about all aspects of their operations. Witness the proliferation of point of sale (POS) transaction recording within retailing, digital storage of census data and computerized hospital records. Whilst the gathering of such data has uses in terms of answering specific queries and allowing visulisation of certain trends the volumes of data can hide significant patterns that would be impossible to locate manually. These patterns, once found, could provide an insight into customer behviour, demographic shifts and patient diagnosis hitherto unseen and unexpected. Remaining competitive in a modem business environment, or delivering services in a timely and cost effective manner for public services is a crucial part of modem economics. Analysis of the data held by an organisaton, by a system that "learns" can allow predictions to be made based on historical evidence. Users may guide the process but essentially the software is exploring the data unaided. The research described within this thesis develops current ideas regarding the exploration of large data volumes. Particular areas of research are the reduction of the search space within the dataset and the generation of rules which are deduced from the patterns within the data. These issues are discussed within an experimental framework which extracts information from binary data

    Front Matter - Soft Computing for Data Mining Applications

    Get PDF
    Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic
    corecore