40,070 research outputs found

    Adaptive Ttwo-phase spatial association rules mining method

    Get PDF
    Since huge amounts of spatial data can be easily collected from various applications, ranging from remote sensing technology to geographical information system, the extraction and comprehension of spatial knowledge is a more and more important task. Many excellent studies on Remote Sensed Image (RSI) have been conducted for potential relationships of crop yield. However, most of them suffer from the performance problem because their techniques for mining association rules are based on Apriori algorithm. In this paper, two efficient algorithms, two-phase spatial association rules mining and adaptive two-phase spatial association rules mining, are proposed for address the above problem. Both methods primarily conduct two phase algorithms by creating Histogram Generators for fast generating coarse-grained spatial association rules, and further mining the fine-grained spatial association rules w.r.t the coarse-grained frequently patterns obtained in the first phase. Adaptive two-phase spatial association rules mining method conducts the idea of partition on an image for efficiently quantizing out non-frequent patterns and thus facilitate the following two phase process. Such two-phase approaches save much computations and will be shown by lots of experimental results in the paper.Facultad de Informátic

    Mining Interesting Positive and Negative Association Rule Based on Improved Genetic Algorithm (MIPNAR_GA)

    Get PDF
    Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the mining of positive and negative rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. A number of algorithms are wonder performance but produce large number of negative association rule and also suffered from multi-scan problem. The idea of this paper is to eliminate these problems and reduce large number of negative rules. Hence we proposed an improved approach to mine interesting positive and negative rules based on genetic and MLMS algorithm. In this method we used a multi-level multiple support of data table as 0 and 1. The divided process reduces the scanning time of database. The proposed algorithm is a combination of MLMS and genetic algorithm. This paper proposed a new algorithm (MIPNAR_GA) for mining interesting positive and negative rule from frequent and infrequent pattern sets. The algorithm is accomplished in to three phases: a).Extract frequent and infrequent pattern sets by using apriori method b).Efficiently generate positive and negative rule. c).Prune redundant rule by applying interesting measures. The process of rule optimization is performed by genetic algorithm and for evaluation of algorithm conducted the real world dataset such as heart disease data and some standard data used from UCI machine learning repository.Keywords— Association rule mining, negative rule and positive rules, frequent and infrequent pattern set, genetic algorithm

    A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data

    Get PDF
    Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently

    Adaptive Ttwo-phase spatial association rules mining method

    Get PDF
    Since huge amounts of spatial data can be easily collected from various applications, ranging from remote sensing technology to geographical information system, the extraction and comprehension of spatial knowledge is a more and more important task. Many excellent studies on Remote Sensed Image (RSI) have been conducted for potential relationships of crop yield. However, most of them suffer from the performance problem because their techniques for mining association rules are based on Apriori algorithm. In this paper, two efficient algorithms, two-phase spatial association rules mining and adaptive two-phase spatial association rules mining, are proposed for address the above problem. Both methods primarily conduct two phase algorithms by creating Histogram Generators for fast generating coarse-grained spatial association rules, and further mining the fine-grained spatial association rules w.r.t the coarse-grained frequently patterns obtained in the first phase. Adaptive two-phase spatial association rules mining method conducts the idea of partition on an image for efficiently quantizing out non-frequent patterns and thus facilitate the following two phase process. Such two-phase approaches save much computations and will be shown by lots of experimental results in the paper.Facultad de Informátic

    Efficient Mining Support-Confidence Based Framework Generalized Association Rules

    Get PDF
    Mining association rules are one of the most critical data mining problems, intensively studied since their inception. Several approaches have been proposed in the literature to extend the basic association rule framework to extract more general rules, including the negation operator. Thereby, this extension is expected to bring valuable knowledge about an examined dataset to the user. However, the efficient extraction of such rules is challenging, especially for sparse datasets. This paper focuses on the extraction of literalsets, i.e., a set of present and absent items. By consequence, generalized association rules can be straightforwardly derived from these literalsets. To this end, we introduce and prove the soundness of a theorem that paves the way to speed up the costly computation of the support of a literalist. Furthermore, we introduce FasterIE, an efficient algorithm that puts the proved theorem at work to efficiently extract the whole set of frequent literalets. Thus, the FasterIE algorithm is shown to devise very efficient strategies, which minimize as far as possible the number of node visits in the explored search space. Finally, we have carried out experiments on benchmark datasets to back the effectiveness claim of the proposed algorithm versus its competitors

    Learning lost temporal fuzzy association rules

    Get PDF
    Fuzzy association rule mining discovers patterns in transactions, such as shopping baskets in a supermarket, or Web page accesses by a visitor to a Web site. Temporal patterns can be present in fuzzy association rules because the underlying process generating the data can be dynamic. However, existing solutions may not discover all interesting patterns because of a previously unrecognised problem that is revealed in this thesis. The contextual meaning of fuzzy association rules changes because of the dynamic feature of data. The static fuzzy representation and traditional search method are inadequate. The Genetic Iterative Temporal Fuzzy Association Rule Mining (GITFARM) framework solves the problem by utilising flexible fuzzy representations from a fuzzy rule-based system (FRBS). The combination of temporal, fuzzy and itemset space was simultaneously searched with a genetic algorithm (GA) to overcome the problem. The framework transforms the dataset to a graph for efficiently searching the dataset. A choice of model in fuzzy representation provides a trade-off in usage between an approximate and descriptive model. A method for verifying the solution to the hypothesised problem was presented. The proposed GA-based solution was compared with a traditional approach that uses an exhaustive search method. It was shown how the GA-based solution discovered rules that the traditional approach did not. This shows that simultaneously searching for rules and membership functions with a GA is a suitable solution for mining temporal fuzzy association rules. So, in practice, more knowledge can be discovered for making well-informed decisions that would otherwise be lost with a traditional approach.EPSRC DT

    Mining data quality rules based on T-dependence

    Get PDF
    Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different
    • …
    corecore