183 research outputs found

    An evolutionary model to mine high expected utility patterns from uncertain databases

    Get PDF
    In recent decades, mobile or the Internet of Thing (IoT) devices are dramatically increasing in many domains and applications. Thus, a massive amount of data is generated and produced. Those collected data contain a large amount of interesting information (i.e., interestingness, weight, frequency, or uncertainty), and most of the existing and generic algorithms in pattern mining only consider the single object and precise data to discover the required information. Meanwhile, since the collected information is huge, and it is necessary to discover meaningful and up-to-date information in a limit and particular time. In this paper, we consider both utility and uncertainty as the majority objects to efficiently mine the interesting high expected utility patterns (HEUPs) in a limit time based on the multi-objective evolutionary framework. The benefits of the designed model (called MOEA-HEUPM) can discover the valuable HEUPs without pre-defined threshold values (i.e., minimum utility and minimum uncertainty) in the uncertain environment. Two encoding methodologies are also considered in the developed MOEA-HEUPM to show its effectiveness. Based on the developed MOEA-HEUPM model, the set of non-dominated HEUPs can be discovered in a limit time for decision-making. Experiments are then conducted to show the effectiveness and efficiency of the designed MOEA-HEUPM model in terms of convergence, hypervolume and number of the discovered patterns compared to the generic approaches.acceptedVersio

    Frequent itemset mining in big data with effective single scan algorithms

    Get PDF
    © 2013 IEEE. This paper considers frequent itemsets mining in transactional databases. It introduces a new accurate single scan approach for frequent itemset mining (SSFIM), a heuristic as an alternative approach (EA-SSFIM), as well as a parallel implementation on Hadoop clusters (MR-SSFIM). EA-SSFIM and MR-SSFIM target sparse and big databases, respectively. The proposed approach (in all its variants) requires only one scan to extract the candidate itemsets, and it has the advantage to generate a fixed number of candidate itemsets independently from the value of the minimum support. This accelerates the scan process compared with existing approaches while dealing with sparse and big databases. Numerical results show that SSFIM outperforms the state-of-the-art FIM approaches while dealing with medium and large databases. Moreover, EA-SSFIM provides similar performance as SSFIM while considerably reducing the runtime for large databases. The results also reveal the superiority of MR-SSFIM compared with the existing HPC-based solutions for FIM using sparse and big databases

    Implementation of an interactive pattern mining framework on electronic health record datasets

    Get PDF
    Large collections of electronic patient records contain a broad range of clinical information highly relevant for data analysis. However, they are maintained primarily for patient administration, and automated methods are required to extract valuable knowledge for predictive, preventive, personalized and participatory medicine. Sequential pattern mining is a fundamental task in data mining which can be used to find statistically relevant, non-trivial temporal dependencies of events such as disease comorbidities. This works objective is to use this mining technique to identify disease associations based on ICD-9-CM codes data of the entire Taiwanese population obtained from Taiwan’s National Health Insurance Research Database. This thesis reports the development and implementation of the Disease Pattern Miner – a pattern mining framework in a medical domain. The framework was designed as a Web application which can be used to run several state-of-the-art sequence mining algorithms on electronic health records, collect and filter the results to reduce the number of patterns to a meaningful size, and visualize the disease associations as an interactive model in a specific population group. This may be crucial to discover new disease associations and offer novel insights to explain disease pathogenesis. A structured evaluation of the data and models are required before medical data-scientist may use this application as a tool for further research to get a better understanding of disease comorbidities
    • …
    corecore