80 research outputs found

    Prime-based method for interactive mining of frequent patterns

    Get PDF
    Over the past decade, an increasing number of efficient mining algorithms have been proposed to mine the frequent patterns by satisfying a user specified threshold called minimum support (minsup). However, determining an appropriate value for minsup to find proper frequent patterns in different applications is extremely difficult. Since rerunning the mining algorithms from scratch can be very time consuming, researchers have introduced interactive mining to find proper patterns by using the current mining model with various minsup. Thus far, a few efficient interactive mining algorithms have been proposed. However, their runtime do not fulfill the need of short runtime in real time applications especially where data is sparse and proper frequent patterns are mined with very low values of minsup. As response to the above-mentioned challenges, this study is devoted towards developing an interactive mining method based on prime number and its special characteristic “uniqueness” by which the content of the relevant data is transformed into a compact layout. At first, a general architecture for interactive mining is proposed consisting of two isolated components: mining model and mining process. Then, the proposed method is developed based on the architecture such that the mining model is constructed once, and it can be frequently mined by various minsup. In the mining model construction, the content of relevant data is captured by a novel tree structure called PC-tree with one database scan and mining materials are consequently formed. The PC-tree is a well-organized tree structure, which is systematically built based on descendant making introduced in this study. Moreover, this study introduces a mining algorithm called PC-miner to mine the mining model frequently with various values of minsup. It grows an effective candidate head set introduced in this study starting from the longest candidate patterns by using the Apriori principle. Meanwhile, during the growing of the candidate head set in each round, the longest candidate patterns are used to find maximal frequent patterns from which the frequent patterns can be derived. Moreover, the PC-miner reduces the number of candidate patterns and comparisons by using several pruning techniques. A comprehensive experimental analysis is conducted by several experiments and scenarios to evaluate the correctness and effectiveness of the proposed method especially for interactive mining. The experimental results verify that the proposed method constructs the mining model independent of minsup once and this enable the model to be frequently mined. The results also show that the proposed method mines frequent patterns correctly and efficiently. Moreover, the results verify that the proposed method speeds up interactive mining of frequent patterns over both sparse and dense datasets with more scalable total runtime for very low values of minsup over sparse datasets as compared to results from the previous work

    Cold-start Problem in Collaborative Recommender Systems: Efficient Methods Based on Ask-to-rate Technique

    Get PDF
    To develop a recommender system, the collaborative filtering is the best known approach, which considers the ratings of users who have similar rating profiles or rating patterns. Consistently, it is able to compute the similarity of users when there are enough ratings expressed by users. Therefore, a major challenge of the collaborative filtering approach can be how to make recommendations for a new user, that is called cold-start user problem. To solve this problem, there have been proposed a few efficient methods based on ask-to-rate technique in which the profile of a new user is made by integrating information gained from a quick interview. This paper is a review of these proposed methods and how to use the ask-to-rate technique. Consequently, they are categorized into non-adaptive and adaptive methods. Then, each category is analyzed and their methods are compared

    CCSA: Conscious Neighborhood-based Crow Search Algorithm for Solving Global Optimization Problems

    Full text link
    © 2019 Elsevier B.V. In this paper, a conscious neighborhood-based crow search algorithm (CCSA) is proposed for solving global optimization and engineering design problems. It is a successful improvement to tackle the imbalance search strategy and premature convergence problems of the crow search algorithm. CCSA introduces three new search strategies called neighborhood-based local search (NLS), non-neighborhood based global search (NGS) and wandering around based search (WAS) in order to improve the movement of crows in different search spaces. Moreover, a neighborhood concept is defined to select the movement strategy between NLS and NGS consciously, which enhances the balance between local and global search. The proposed CCSA is evaluated on several benchmark functions and four applied problems of engineering design. In all experiments, CCSA is compared by other state-of-the-art swarm intelligence algorithms: CSA, BA, CLPSO, GWO, EEGWO, WOA, KH, ABC, GABC, and Best-so-far ABC. The experimental and statistical results show that CCSA is very competitive especially for large-scale optimization problems, and it is significantly superior to the compared algorithms. Furthermore, the proposed algorithm also finds the best optimal solution for the applied problems of engineering design

    A numerical method for frequent pattern mining

    Get PDF
    Frequent pattern mining is one of the active research themes in data mining. It plays an important role in all data mining tasks such as clustering, classification, prediction, and association analysis. Identifying all frequent patterns is the most time consuming process due to a massive number of patterns generated. A reasonable solution is identifying maximal frequent patterns which form the smallest representative set of patterns to generate all frequent patterns. In this paper, an efficient numerical method for mining frequent patterns is proposed. This method is based on prime number characteristics to generate all frequent patterns by using maximal frequent ones. There are two new properties introduced in this method; a novel tree structure called PC_Tree and PC_Miner algorithm. The PC_Tree is a simple tree structure but yet capable to capture the whole of transactions information with an efficient data transformation technique that utilizes the prime number theory. The PC_Miner algorithm traverses the PC_Tree by using an efficient pruning technique. The experimental results verify the compactness and the efficiency of mining shown by the proposed method

    English-Persian Plagiarism Detection based on a Semantic Approach

    Get PDF
    Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-lingual plagiarism. In cross-lingual translation, writers meld a translation with their own words and ideas. Based on monolingual plagiarism detection methods, this paper ultimately intends to find a way to detect cross-lingual plagiarism. A framework called Multi-Lingual Plagiarism Detection (MLPD) has been presented for cross-lingual plagiarism analysis with ultimate objective of detection of plagiarism cases. English is the reference language and Persian materials are back translated using translation tools. The data for assessment of MLPD were obtained from English-Persian Mizan parallel corpus. Apache’s Solr was also applied to record the creep of the documents and their indexation. The accuracy mean of the proposed method revealed to be 98.82% when employing highly accurate translation tools which indicate the high accuracy of the proposed method. Also, Google translation service showed the accuracy mean to be 56.9%. These tests demonstrate that improved translation tools enhance the accuracy of the proposed method

    A new method for mining maximal frequent itemsets

    Get PDF
    In this paper, we propose a new method for mining maximal frequent itemsets. Our method introduces an efficient database encoding technique, a novel tree structure called PC_Tree and also PC_Miner algorithm. The database encoding technique utilizes Prime number characteristics and transforms each transaction into a positive integer that has all properties of its items. The PC_Tree is a simple tree structure but yet powerful to capture whole of transactions by one database scan. The PC_Miner algorithm traverses the PC_Tree and builds the gcd (greatest common divisor) set of its nodes to mine maximal frequent itemsets. Experiments verify the efficiency and advantages of the proposed method
    corecore