11 research outputs found

    An improved multiple minimum support based approach to mine rare association rules

    Full text link

    A Novel Approach for Finding Rare Items Based on Multiple Minimum Support Framework

    Get PDF
    AbstractPattern mining methods describe valuable and advantageous items from a large amount of records stored in the corporate datasets and repositories. While mining, literature has almost singularly focused on frequent itemset but in many applications rare ones are of higher interest. For Example medical dataset can be considered, where rare combination of prodrome plays a vital role for the physicians. As rare items contain worthwhile information, researchers are making efforts to examine effective methodologies to extract the same. In this paper, an effort is made to analyze the complete set of rare items for finding almost all possible rare association rules from the dataset. The Proposed approach makes use of Maximum constraint model for extracting the rare items. A new approach is efficient to mine rare association rules which can be defined as rules containing the rare items. Based on the study of relevant data structures of the mining space, this approach utilizes a tree structure to ascertain the rare items. Finally, it is demonstrated that this new approach is more virtuous and robust than the existing algorithms

    Mining Interesting Positive and Negative Association Rule Based on Improved Genetic Algorithm (MIPNAR_GA)

    Get PDF
    Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the mining of positive and negative rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. A number of algorithms are wonder performance but produce large number of negative association rule and also suffered from multi-scan problem. The idea of this paper is to eliminate these problems and reduce large number of negative rules. Hence we proposed an improved approach to mine interesting positive and negative rules based on genetic and MLMS algorithm. In this method we used a multi-level multiple support of data table as 0 and 1. The divided process reduces the scanning time of database. The proposed algorithm is a combination of MLMS and genetic algorithm. This paper proposed a new algorithm (MIPNAR_GA) for mining interesting positive and negative rule from frequent and infrequent pattern sets. The algorithm is accomplished in to three phases: a).Extract frequent and infrequent pattern sets by using apriori method b).Efficiently generate positive and negative rule. c).Prune redundant rule by applying interesting measures. The process of rule optimization is performed by genetic algorithm and for evaluation of algorithm conducted the real world dataset such as heart disease data and some standard data used from UCI machine learning repository.Keywords— Association rule mining, negative rule and positive rules, frequent and infrequent pattern set, genetic algorithm

    Comparison of deposition methods of ZnO thin film on flexible substrate

    Get PDF
    This paper reports the effect of the different deposition methods towards the ZnO nanostructure crystal quality and film thickness on the polyimide substrate. The ZnO film has been deposited by using the spray pyrolysis technique, sol-gel and RF Sputtering. Different methods give a different nanostructure of the ZnO thin film. Sol gel methods, results of nanoflowers ZnO thin film with the thickness of thin film is 600nm. It also produces the best of the piezoelectric effect in term of electrical performance, which is 5.0 V and 12 MHz of frequency which is higher than other frequency obtained by spray pyrolysis and RF sputtering

    Identificación de relaciones entre los nodos de una red social

    Get PDF
    In this paper a review is conduced about representation and classifi cation of membership among nodes belonging to a social network. For this purpose, topics such as Natural Language Processing, Text Mining, Information Retrieval and Named Entities are considered description and survey of outstanding approaches is carry out in each topic.El presente artículo realiza una revisión del tema, representación y clasificación de de relaciones de pertenencia entre los nodos de una red social. Para ello, se abordan aspectos sobre Procesamiento de Lenguaje Natural, Minería de Texto, Recuperación de Informacióny Entidades Nombradas. Se hace una descripción de cada una de ellas y se referencian y discuten trabajos académicos destacados que se han desarrollado en dicho tema

    A hybrid recommendation approach for a tourism system

    Get PDF
    Many current e-commerce systems provide personalization when their content is shown to users. In this sense, recommender systems make personalized suggestions and provide information of items available in the system. Nowadays, there is a vast amount of methods, including data mining techniques that can be employed for personalization in recommender systems. However, these methods are still quite vulnerable to some limitations and shortcomings related to recommender environment. In order to deal with some of them, in this work we implement a recommendation methodology in a recommender system for tourism, where classification based on association is applied. Classification based on association methods, also named associative classification methods, consist of an alternative data mining technique, which combines concepts from classification and association in order to allow association rules to be employed in a prediction context. The proposed methodology was evaluated in some case studies, where we could verify that it is able to shorten limitations presented in recommender systems and to enhance recommendation quality

    Developing and deploying data mining techniques in healthcare

    Get PDF
    Improving healthcare is a top priority for all nations. US healthcare expenditure was $3 trillion in 2014. In the same year, the share of GDP assigned to healthcare expenditure was 17.5%. These statistics shows the importance of making improvement in healthcare delivery system. In this research, we developed several data mining methods and algorithms to address healthcare problems. These methods can also be applied to the problems in other domains.The first part of this dissertation is about rare item problem in association analysis. This problem deals with the discovering rare rules, which include rare items. In this study, we introduced a novel assessment metric, called adjusted support to address this problem. By applying this metric, we can retrieve rare rules without over-generating association rules. We applied this method to perform association analysis on complications of diabetes.The second part of this dissertation is developing a clinical decision support system for predicting retinopathy. Retinopathy is the leading cause of vision loss among American adults. In this research, we analyzed data from more than 1.4 million diabetic patients and developed four sets of predictive models: basic, comorbid, over-sampled, and ensemble models. The results show that incorporating comorbidity data and oversampling improved the accuracy of prediction. In addition, we developed a novel "confidence margin" ensemble approach that outperformed the existing ensemble models. In ensemble models, we also addressed the issue of tie in voting-based ensemble models by comparing the confidence margins of the base predictors.The third part of this dissertation addresses the problem of imbalanced data learning, which is a major challenge in machine learning. While a standard machine learning technique could have a good performance on balanced datasets, when applied to imbalanced datasets its performance deteriorates dramatically. This poor performance is rather troublesome especially in detecting the minority class that usually is the class of interest. In this study, we proposed a synthetic informative minority over-sampling (SIMO) algorithm embedded into support vector machine. We applied SIMO to 15 publicly available benchmark datasets and assessed its performance in comparison with seven existing approaches. The results showed that SIMO outperformed all existing approaches