126 research outputs found
Discovering High-Utility Itemsets at Multiple Abstraction Levels
High-Utility Itemset Mining (HUIM) is a relevant data mining task. The goal is to discover recurrent combinations of items characterized by high prot from transactional datasets. HUIM has a wide range of applications among which market basket analysis and service proling. Based on the observation that items can be clustered into domain-specic categories, a parallel research issue is generalized itemset mining. It entails generating correlations among data items at multiple abstraction levels. The extraction of multiple-level patterns affords new insights into the analyzed data from dierent viewpoints. This paper aims at discovering a novel pattern that combines the expressiveness of generalized and High-Utility itemsets. According to a user-defined taxonomy items are rst aggregated into semantically related categories. Then, a new type of pattern,namely the Generalized High-utility Itemset (GHUI), is extracted. It represents a combinations of items at different granularity levels characterized by high prot (utility). While protable combinations of item categories provide interesting high-level information, GHUIs at lower abstraction levels represent more specic correlationsamong protable items. A single-phase algorithm is proposed to efficiently discover utility itemsets at multiple abstraction levels. The experiments, which were performed on both real and synthetic data, demonstrate the effectiveness and usefulness of the proposed approach
CAS-MINE: Providing personalized services in context-aware applications by means of generalized rules
Context-aware systems acquire and exploit information on the user context to tailor services to a particular user, place, time, and/or event. Hence, they allowservice providers to adapt their services to actual user needs, by offering personalized services depending on the current user context. Service providers are usually interested in profiling users both
to increase client satisfaction and to broaden the set of offered services. Novel and efficient techniques are needed to tailor service supply to the user (or the user category) and to the situation inwhich he/she is involved. This paper presents the CAS-Mine framework to efficiently
discover relevant relationships between user context data and currently asked services for both user and service profiling. CAS-Mine efficiently extracts generalized association rules, which provide a high-level abstraction of both user habits and service characteristics depending
on the context. A lazy (analyst-provided) taxonomy evaluation performed on different attributes (e.g., a geographic hierarchy on spatial coordinates, a classification of provided services) drives the rule generalization process. Extracted rules are classified into groups according to their semantic meaning and ranked by means of quality indices, thus allowing a domain expert to focus on the most relevant patterns. Experiments performed on three context-aware datasets, obtained by logging user requests and context information for three
real applications, show the effectiveness and the efficiency of the CAS-Mine framework in mining different valuable types of correlations between user habits, context information, and provided services
Novel Algorithms for Cross-Ontology Multi-Level Data Mining
The wide spread use of ontologies in many scientific areas creates a wealth of ontologyannotated data and necessitates the development of ontology-based data mining algorithms. We have developed generalization and mining algorithms for discovering cross-ontology relationships via ontology-based data mining. We present new interestingness measures to evaluate the discovered cross-ontology relationships. The methods presented in this dissertation employ generalization as an ontology traversal technique for the discovery of interesting and informative relationships at multiple levels of abstraction between concepts from different ontologies. The generalization algorithms combine ontological annotations with the structure and semantics of the ontologies themselves to discover interesting crossontology relationships. The first algorithm uses the depth of ontological concepts as a guide for generalization. The ontology annotations are translated to higher levels of abstraction one level at a time accompanied by incremental association rule mining. The second algorithm conducts a generalization of ontology terms to all their ancestors via transitive ontology relations and then mines cross-ontology multi-level association rules from the generalized transactions. Our interestingness measures use implicit knowledge conveyed by the relation semantics of the ontologies to capture the usefulness of cross-ontology relationships. We describe the use of information theoretic metrics to capture the interestingness of cross-ontology relationships and the specificity of ontology terms with respect to an annotation dataset. Our generalization and data mining agorithms are applied to the Gene Ontology and the postnatal Mouse Anatomy Ontology. The results presented in this work demonstrate that our generalization algorithms and interestingness measures discover more interesting and better quality relationships than approaches that do not use generalization. Our algorithms can be used by researchers and ontology developers to discover inter-ontology connections. Additionally, the cross-ontology relationships discovered using our algorithms can be used by researchers to understand different aspects of entities that interest them
Information fusion from multiple databases using meta-association rules
Nowadays, data volume, distribution, and volatility make it difficult to search global patterns by applying traditional Data Mining techniques. In the case of data in a distributed environment, sometimes a local analysis of each dataset separately is adequate but some other times a global decision is needed by the analysis of the entire data. Association rules discovering methods typically require a single uniform dataset and managing with the entire set of distributed data is not possible due to its size. To address the scenarios in which satisfying this requirement is not practical or even feasible, we propose a new method for fusing information, in the form of rules, extracted from multiple datasets. The proposed model produces meta-association rules, i.e. rules in which the antecedent or the consequent may contain rules as well, for finding joint correlations among trends found individually in each dataset. In this paper, we describe the formulation and the implementation of two alternative frameworks that obtain, respectively, crisp meta-rules and fuzzy meta-rules. We compare our proposal with the information obtained when the datasets are not separated, in order to see the main differences between traditional association rules and meta-association rules. We also compare crisp and fuzzy methods for meta-association rule mining, observing that the fuzzy approach offers several advantages: it is more accurate since it incorporates the strength or validity of the previous information, produces a more manageable set of rules for human inspection, and allows the incorporation of contextual information to the mining process expressed in a more human-friendly format
Recommended from our members
MapReduce network enabled algorithms for classification based on association rules
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.There is growing evidence that integrating classification and association rule mining can produce more efficient and accurate classifiers than traditional techniques. This thesis introduces a new MapReduce based association rule miner for extracting strong rules from large datasets. This miner is used later to develop a new large scale classifier. Also new MapReduce simulator was developed to evaluate the scalability of proposed algorithms on MapReduce clusters.
The developed associative rule miner inherits the MapReduce scalability to huge datasets and to thousands of processing nodes. For finding frequent itemsets, it uses hybrid approach between miners that uses counting methods on horizontal datasets, and miners that use set intersections on datasets of vertical formats. The new miner generates same rules that usually generated using apriori-like algorithms because it uses the same confidence and support thresholds definitions.
In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. This thesis also introduces a new MapReduce classifier that based MapReduce associative rule mining. This algorithm employs different approaches in rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. The new classifier works on multi-class datasets and is able to produce multi-label predications with probabilities for each predicted label. To evaluate the classifier 20 different datasets from the UCI data collection were used. Results show that the proposed approach is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative classification approaches.
Also a MapReduce simulator was developed to measure the scalability of MapReduce based applications easily and quickly, and to captures the behaviour of algorithms on cluster environments. This also allows optimizing the configurations of MapReduce clusters to get better execution times and hardware utilization
- …