371 research outputs found
Expressive generalized itemsets
Generalized itemset mining is a powerful tool to discover multiple-level correlations among the analyzed data. A taxonomy is used to aggregate data items into higher-level concepts and to discover frequent recurrences among data items at different granularity levels. However, since traditional high-level itemsets may also represent the knowledge covered by their lower-level frequent descendant itemsets, the expressiveness of high-level itemsets can be rather limited. To overcome this issue, this article proposes two novel itemset types, called Expressive Generalized Itemset (EGI) and Maximal Expressive Generalized Itemset (Max-EGI), in which the frequency of occurrence of a high-level itemset is evaluated only on the portion of data not yet covered by any of its frequent descendants. Specifically, EGI s represent, at a high level of abstraction, the knowledge associated with sets of infrequent itemsets, while Max-EGIs compactly represent all the infrequent descendants of a generalized itemset. Furthermore, we also propose an algorithm to discover Max-EGIs at the top of the traditionally mined itemsets. Experiments, performed on both real and synthetic datasets, demonstrate the effectiveness, efficiency, and scalability of the proposed approac
Efficient Closed Pattern Mining in the Presence of Tough Block Constraints
In recent years, various constrained frequent pattern mining problem formulations and associated algorithms have been developed that enable the user to specify various itemsetbased constraints that better capture the underlying application requirements and characteristics. In this paper we introduce a new class of block constraints that determine the significance of an itemset pattern by considering the dense block that is formed by the pattern's items and its associated set of transactions. Block constraints provide a natural framework by which a number of important problems can be specified and make it possible to solve numerous problems on binary and real-valued datasets. However, developing computationally e#cient algorithms to find these block constraints poses a number of challenges as unlike the di#erent itemset-based constraints studied earlier, these block constraints are tough as they are neither anti-monotone, monotone, nor convertible. To overcome this problem, we introduce a new class of pruning methods that can be used to significantly reduce the overall search space and make it possible to develop computationally e#cient block constraint mining algorithms. We present an algorithm called CBMiner that takes advantage of these pruning methods to develop an algorithm for finding the closed itemsets that satisfy the block constraints. Our extensive performance study shows that CBMiner generates more concise result set and can be order(s) of magnitude faster than the traditional frequent closed itemset mining algorithms
Discover, recycle and reuse frequent patterns in association rule mining
Ph.DDOCTOR OF PHILOSOPH
Colossal Trajectory Mining: A unifying approach to mine behavioral mobility patterns
Spatio-temporal mobility patterns are at the core of strategic applications such as urban planning and monitoring. Depending on the strength of spatio-temporal constraints, different mobility patterns can be defined. While existing approaches work well in the extraction of groups of objects sharing fine-grained paths, the huge volume of large-scale data asks for coarse-grained solutions. In this paper, we introduce Colossal Trajectory Mining (CTM) to efficiently extract heterogeneous mobility patterns out of a multidimensional space that, along with space and time dimensions, can consider additional trajectory features (e.g., means of transport or activity) to characterize behavioral mobility patterns. The algorithm is natively designed in a distributed fashion, and the experimental evaluation shows its scalability with respect to the involved features and the cardinality of the trajectory dataset
Exploring Data Hierarchies to Discover Knowledge in Different Domains
L'abstract è presente nell'allegato / the abstract is in the attachmen
Enhancing web marketing by using ontology
The existence of the Web has a major impact on people\u27s life styles. Online shopping, online banking, email, instant messenger services, search engines and bulletin boards have gradually become parts of our daily life. All kinds of information can be found on the Web. Web marketing is one of the ways to make use of online information. By extracting demographic information and interest information from the Web, marketing knowledge can be augmented by applying data mining algorithms. Therefore, this knowledge which connects customers to products can be used for marketing purposes and for targeting existing and potential customers. The Web Marketing Project with Ontology Support has the purpose to find and improve marketing knowledge.
In the Web Marketing Project, association rules about marketing knowledge have been derived by applying data mining algorithms to existing Web users\u27 data. An ontology was used as a knowledge backbone to enhance data mining for marketing. The Raising Method was developed by taking advantage of the ontology. Data are preprocessed by Raising before being fed into data mining algorithms. Raising improves the quality of the set of mined association rules by increasing the average support value. Also, new rules have been discovered after applying Raising. This dissertation thoroughly describes the development and analysis of the Raising method. Moreover, a new structure, called Intersection Ontology, is introduced to represent customer groups on demand. Only needed customer nodes are created. Such an ontology is used to simplify the marketing knowledge representation. Finally, some additional ontology usages are mentioned. By integrating an ontology into Web marketing, the marketing process support has been greatly improved
- …