372 research outputs found

    Mining Profitable and Concise Patterns in Large-Scale Internet of Things Environments

    Get PDF
    In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high-utility itemsets) in a large-scale database especially with IoT situations. First, a GA-based MapReduce model is presented in this work known as GMR-Miner for mining closed patterns with high utilization in large-scale databases. The -means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3-tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR-Miner compared to the well-known and state-of-the-art CLS-Miner. We present our in-depth results to show that the developed GMR-Miner outperforms CLS-Miner in many criteria, i.e., memory usage, scalability, and runtime.publishedVersio

    A COLLABORATIVE FILTERING APPROACH TO PREDICT WEB PAGES OF INTEREST FROMNAVIGATION PATTERNS OF PAST USERS WITHIN AN ACADEMIC WEBSITE

    Get PDF
    This dissertation is a simulation study of factors and techniques involved in designing hyperlink recommender systems that recommend to users, web pages that past users with similar navigation behaviors found interesting. The methodology involves identification of pertinent factors or techniques, and for each one, addresses the following questions: (a) room for improvement; (b) better approach, if any; and (c) performance characteristics of the technique in environments that hyperlink recommender systems operate in. The following four problems are addressed:Web Page Classification. A new metric (PageRank × Inverse Links-to-Word count ratio) is proposed for classifying web pages as content or navigation, to help in the discovery of user navigation behaviors from web user access logs. Results of a small user study suggest that this metric leads to desirable results.Data Mining. A new apriori algorithm for mining association rules from large databases is proposed. The new algorithm addresses the problem of scaling of the classical apriori algorithm by eliminating an expensive joinstep, and applying the apriori property to every row of the database. In this study, association rules show the correlation relationships between user navigation behaviors and web pages they find interesting. The new algorithm has better space complexity than the classical one, and better time efficiency under some conditionsand comparable time efficiency under other conditions.Prediction Models for User Interests. We demonstrate that association rules that show the correlation relationships between user navigation patterns and web pages they find interesting can be transformed intocollaborative filtering data. We investigate collaborative filtering prediction models based on two approaches for computing prediction scores: using simple averages and weighted averages. Our findings suggest that theweighted averages scheme more accurately computes predictions of user interests than the simple averages scheme does.Clustering. Clustering techniques are frequently applied in the design of personalization systems. We studied the performance of the CLARANS clustering algorithm in high dimensional space in relation to the PAM and CLARA clustering algorithms. While CLARA had the best time performance, CLARANS resulted in clusterswith the lowest intra-cluster dissimilarities, and so was most effective in this regard

    State of the Art in Privacy Preserving Data Mining

    Get PDF
    Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when Data Mining techniques are used. Such a trend, especially in the context of public databases, or in the context of sensible information related to critical infrastructures, represents, nowadays a not negligible thread. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. This is a very complex task and there exist in the scientific literature some different approaches to the problem. In this work we present a "Survey" of the current PPDM methodologies which seem promising for the future.JRC.G.6-Sensors, radar technologies and cybersecurit

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    An evolutionary model to mine high expected utility patterns from uncertain databases

    Get PDF
    In recent decades, mobile or the Internet of Thing (IoT) devices are dramatically increasing in many domains and applications. Thus, a massive amount of data is generated and produced. Those collected data contain a large amount of interesting information (i.e., interestingness, weight, frequency, or uncertainty), and most of the existing and generic algorithms in pattern mining only consider the single object and precise data to discover the required information. Meanwhile, since the collected information is huge, and it is necessary to discover meaningful and up-to-date information in a limit and particular time. In this paper, we consider both utility and uncertainty as the majority objects to efficiently mine the interesting high expected utility patterns (HEUPs) in a limit time based on the multi-objective evolutionary framework. The benefits of the designed model (called MOEA-HEUPM) can discover the valuable HEUPs without pre-defined threshold values (i.e., minimum utility and minimum uncertainty) in the uncertain environment. Two encoding methodologies are also considered in the developed MOEA-HEUPM to show its effectiveness. Based on the developed MOEA-HEUPM model, the set of non-dominated HEUPs can be discovered in a limit time for decision-making. Experiments are then conducted to show the effectiveness and efficiency of the designed MOEA-HEUPM model in terms of convergence, hypervolume and number of the discovered patterns compared to the generic approaches.acceptedVersio
    • …
    corecore