Search CORE

283 research outputs found

Mining High Utility Itemsets with Regular Occurrence

Author: Amphawan Komate
Jitpattanakul Anuchit
Lenca Philippe
Surarerks Athasit
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 31/08/2016
Field of study

High utility itemset mining (HUIM) plays an important role in the data mining community and in a wide range of applications. For example, in retail business it is used for finding sets of sold products that give high profit, low cost, etc. These itemsets can help improve marketing strategies, make promotions/ advertisements, etc. However, since HUIM only considers utility values of items/itemsets, it may not be sufficient to observe product-buying behavior of customers such as information related to "regular purchases of sets of products having a high profit margin". To address this issue, the occurrence behavior of itemsets (in the term of regularity) simultaneously with their utility values was investigated. Then, the problem of mining high utility itemsets with regular occurrence (MHUIR) to find sets of co-occurrence items with high utility values and regular occurrence in a database was considered. An efficient single-pass algorithm, called MHUIRA, was introduced. A new modified utility-list structure, called NUL, was designed to efficiently maintain utility values and occurrence information and to increase the efficiency of computing the utility of itemsets. Experimental studies on real and synthetic datasets and complexity analyses are provided to show the efficiency of MHUIRA combined with NUL in terms of time and space usage for mining interesting itemsets based on regularity and utility constraints

Journal of ICT Research and Applications

Directory of Open Access Journals

HAL-Université de Bretagne Occidentale

ITB Journal

Misleading Generalized Itemset discovery

Author: Amphawan
Brin
Gharib
Han
Hilderman
Kunkle
Luca Cagliero
Luigi Grimaudo
Nahar
Paolo Garza
Sriphaew
Tan
Tania Cerquitelli
Wu
Wu
Publication venue: ELSEVIER
Publication date: 01/01/2014
Field of study

Frequent generalized itemset mining is a data mining technique utilized to discover a high-level view of interesting knowledge hidden in the analyzed data. By exploiting a taxonomy, patterns are usually extracted at any level of abstraction. However, some misleading high-level patterns could be included in the mined set. This paper proposes a novel generalized itemset type, namely the Misleading Generalized Itemset (MGI). Each MGI represents a frequent generalized itemset X and its set E of low-level frequent descendants for which the correlation type is in contrast to the one of X. To allow experts to analyze the misleading high-level data correlations separately and exploit such knowledge by making different decisions, MGIs are extracted only if the low-level descendant itemsets that represent contrasting correlations cover almost the same portion of data as the high-level (misleading) ancestor. An algorithm to mine MGIs at the top of traditional generalized itemsets is also proposed. The experiments performed on both real and synthetic datasets demonstrate the effectiveness and efficiency of the proposed approac

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Mining Traversal Patterns from Weighted Traversals and Graph

Author: 이성대
Publication venue: 한국해양대학교
Publication date: 01/08/2007
Field of study

실세계의 많은 문제들은 그래프와 그 그래프를 순회하는 트랜잭션으로 모델링될 수 있다. 예를 들면, 웹 페이지의 연결구조는 그래프로 표현될 수 있고, 사용자의 웹 페이지 방문경로는 그 그래프를 순회하는 트랜잭션으로 모델링될 수 있다. 이와 같이 그래프를 순회하는 트랜잭션으로부터 중요하고 가치 있는 패턴을 찾아내는 것은 의미 있는 일이다. 이러한 패턴을 찾기 위한 지금까지의 연구에서는 순회나 그래프의 가중치를 고려하지 않고 단순히 빈발하는 패턴만을 찾는 알고리즘을 제안하였다. 이러한 알고리즘의 한계는 보다 신뢰성 있고 정확한 패턴을 탐사하는 데 어려움이 있다는 것이다. 본 논문에서는 순회나 그래프의 정점에 부여된 가중치를 고려하여 패턴을 탐사하는 두 가지 방법들을 제안한다. 첫 번째 방법은 그래프를 순회하는 정보에 가중치가 존재하는 경우에 빈발 순회 패턴을 탐사하는 것이다. 그래프 순회에 부여될 수 있는 가중치로는 두 도시간의 이동 시간이나 웹 사이트를 방문할 때 한 페이지에서 다른 페이지로 이동하는 시간 등이 될 수 있다. 본 논문에서는 좀 더 정확한 순회 패턴을 마이닝하기 위해 통계학의 신뢰 구간을 이용한다. 즉, 전체 순회의 각 간선에 부여된 가중치로부터 신뢰 구간을 구한 후 신뢰 구간의 내에 있는 순회만을 유효한 것으로 인정하는 방법이다. 이러한 방법을 적용함으로써 더욱 신뢰성 있는 순회 패턴을 마이닝할 수 있다. 또한 이렇게 구한 패턴과 그래프 정보를 이용하여 패턴 간의 우선순위를 결정할 수 있는 방법과 성능 향상을 위한 알고리즘도 제시한다. 두 번째 방법은 그래프의 정점에 가중치가 부여된 경우에 가중치가 고려된 빈발 순회 패턴을 탐사하는 방법이다. 그래프의 정점에 부여될 수 있는 가중치로는 웹 사이트 내의 각 문서의 정보량이나 중요도 등이 될 수 있다. 이 문제에서는 빈발 순회 패턴을 결정하기 위하여 패턴의 발생 빈도뿐만 아니라 방문한 정점의 가중치를 동시에 고려하여야 한다. 이를 위해 본 논문에서는 정점의 가중치를 이용하여 향후에 빈발 패턴이 될 가능성이 있는 후보 패턴은 각 마이닝 단계에서 제거하지 않고 유지하는 알고리즘을 제안한다. 또한 성능 향상을 위해 후보 패턴의 수를 감소시키는 알고리즘도 제안한다. 본 논문에서 제안한 두 가지 방법에 대하여 다양한 실험을 통하여 수행 시간 및 생성되는 패턴의 수 등을 비교 분석하였다. 본 논문에서는 순회에 가중치가 있는 경우와 그래프의 정점에 가중치가 있는 경우에 빈발 순회 패턴을 탐사하는 새로운 방법들을 제안하였다. 제안한 방법들을 웹 마이닝과 같은 분야에 적용함으로써 웹 구조의 효율적인 변경이나 웹 문서의 접근 속도 향상, 사용자별 개인화된 웹 문서 구축 등이 가능할 것이다.Abstract ⅶ Chapter 1 Introduction 1.1 Overview 1.2 Motivations 1.3 Approach 1.4 Organization of Thesis Chapter 2 Related Works 2.1 Itemset Mining 2.2 Weighted Itemset Mining 2.3 Traversal Mining 2.4 Graph Traversal Mining Chapter 3 Mining Patterns from Weighted Traversals on Unweighted Graph 3.1 Definitions and Problem Statements 3.2 Mining Frequent Patterns 3.2.1 Augmentation of Base Graph 3.2.2 In-Mining Algorithm 3.2.3 Pre-Mining Algorithm 3.2.4 Priority of Patterns 3.3 Experimental Results Chapter 4 Mining Patterns from Unweighted Traversals on Weighted Graph 4.1 Definitions and Problem Statements 4.2 Mining Weighted Frequent Patterns 4.2.1 Pruning by Support Bounds 4.2.2 Candidate Generation 4.2.3 Mining Algorithm 4.3 Estimation of Support Bounds 4.3.1 Estimation by All Vertices 4.3.2 Estimation by Reachable Vertices 4.4 Experimental Results Chapter 5 Conclusions and Further Works Reference

한국해양대학교(KMOU)

Parallel Algorithms for Discovery of Association Rules

Author: D Cheung
R Agrawal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1997
Field of study

Crossref

A Safety Support System for Children\u27s Antiloss

Author: Nellutla Sai Ram
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2015
Field of study

In the recent past, crimes against children and the number of the missing children have been stayed at high. It is a tragic disaster for a family if their child is missing. Feeling safe about their children is very important for the parents. Therefore, there is an urgent requirement for safety support systems to prevent crimes against children and for anti-loss, particularly when the children are on their own, such as on the ways to and from schools. Thanks to the highly development of telecommunication and mobile technologies, preventive devices such as child ID kits, family trackers have come to light. However, they haven\u27t been impressive solutions yet as they only track current positions of the children and lack of intimations for the parents when their children are under potential dangers. In this thesis, a data mining framework is introduced, in which secure areas and secure paths of the children are learned based on their location histories. When the system predicts the children to be potentially unsafe (e.g., in a strange area or on a strange route), automatic reports will be sent to their parents. Furthermore, an indoor positioning method utilizing Bluetooth is also proposed. Based on the android platform, a prototype of the application for both children and parents is developed incorporating with the proposed techniques in this thesis

The Research Repository @ WVU (West Virginia University)

Parallel Mining of Association Rules Using a Lattice Based Approach

Author: Thomas Wessel Morant
Publication venue: NSUWorks
Publication date: 01/01/2009
Field of study

The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to processors for processing and identification of frequent itemsets. Experimental results show that DDRM utilizes the processors efficiently and performed better than the prefix-based and partition algorithms that use a static approach to assign classes to the processors. The DDRM algorithm scales well and shows good speedup

NSU Works

Mining association rules for the quality improvement of the production process

Author: Kamsu-Foguem Bernard
Mauget Félix
Rigal Fabien
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Academics and practitioners have a common interest in the continuing development of methods and computer applications that support or perform knowledge-intensive engineering tasks. Operations management dysfunctions and lost production time are problems of enormous magnitude that impact the performance and quality of industrial systems as well as their cost of production. Association rule mining is a data mining technique used to find out useful and invaluable information from huge databases. This work develops a better conceptual base for improving the application of association rule mining methods to extract knowledge on operations and information management. The emphasis of the paper is on the improvement of the operations processes. The application example details an industrial experiment in which association rule mining is used to analyze the manufacturing process of a fully integrated provider of drilling products. The study reports some new interesting results with data mining and knowledge discovery techniques applied to a drill production process. Experiment’s results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes

Open Archive Toulouse Archive Ouverte

Combining Association Rule Mining and Clustering

Author: Syed Hussain Tanzeel
Publication venue
Publication date: 08/11/2019
Field of study

Trepo - Institutional Repository of Tampere University