8,648 research outputs found
Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems
Frequent itemset mining leads to the discovery of associations and
correlations among items in large transactional databases. Apriori is a
classical frequent itemset mining algorithm, which employs iterative passes
over database combining with generation of candidate itemsets based on frequent
itemsets found at the previous iteration, and pruning of clearly infrequent
itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of
Apriori, which tries to reduce the number of passes made over a transactional
database while keeping the number of itemsets counted in a pass relatively low.
In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi
many-core system for the case when the transactional database fits in main
memory. Intel Xeon Phi provides a large number of small compute cores with
vector processing units. The paper presents a parallel implementation of DIC
based on OpenMP technology and thread-level parallelism. We exploit the
bit-based internal layout for transactions and itemsets. This technique reduces
the memory space for storing the transactional database, simplifies the support
count via logical bitwise operation, and allows for vectorization of such a
step. Experimental evaluation on the platforms of the Intel Xeon CPU and the
Intel Xeon Phi coprocessor with large synthetic and real databases showed good
performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information
Technology (http://cit.fer.hr
Customers Behavior Modeling by Semi-Supervised Learning in Customer Relationship Management
Leveraging the power of increasing amounts of data to analyze customer base
for attracting and retaining the most valuable customers is a major problem
facing companies in this information age. Data mining technologies extract
hidden information and knowledge from large data stored in databases or data
warehouses, thereby supporting the corporate decision making process. CRM uses
data mining (one of the elements of CRM) techniques to interact with customers.
This study investigates the use of a technique, semi-supervised learning, for
the management and analysis of customer-related data warehouse and information.
The idea of semi-supervised learning is to learn not only from the labeled
training data, but to exploit also the structural information in additionally
available unlabeled data. The proposed semi-supervised method is a model by
means of a feed-forward neural network trained by a back propagation algorithm
(multi-layer perceptron) in order to predict the category of an unknown
customer (potential customers). In addition, this technique can be used with
Rapid Miner tools for both labeled and unlabeled data
- ā¦