20,239 research outputs found

    Pattern Mining and Sense-Making Support for Enhancing the User Experience

    Get PDF
    While data mining techniques such as frequent itemset and sequence mining are well established as powerful pattern discovery tools in domains from science, medicine to business, a detriment is the lack of support for interactive exploration of high numbers of patterns generated with diverse parameter settings and the relationships among the mined patterns. To enhance the user experience, real-time query turnaround times and improved support for interactive mining are desired. There is also an increasing interest in applying data mining solutions for mobile data. Patterns mined over mobile data may enable context-aware applications ranging from automating frequently repeated tasks to providing personalized recommendations. Overall, this dissertation addresses three problems that limit the utility of data mining, namely, (a.) lack of interactive exploration tools for mined patterns, (b.) insufficient support for mining localized patterns, and (c.) high computational mining requirements prohibiting mining of patterns on smaller compute units such as a smartphone. This dissertation develops interactive frameworks for the guided exploration of mined patterns and their relationships. Contributions include the PARAS pre- processing and indexing framework; enabling analysts to gain key insights into rule relationships in a parameter space view due to the compact storage of rules that enables query-time reconstruction of complete rulesets. Contributions also include the visual rule exploration framework FIRE that presents an interactive dual view of the parameter space and the rule space, that together enable enhanced sense-making of rule relationships. This dissertation also supports the online mining of localized association rules computed on data subsets by selectively deploying alternative execution strategies that leverage multidimensional itemset-based data partitioning index. Finally, we designed OLAPH, an on-device context-aware service that learns phone usage patterns over mobile context data such as app usage, location, call and SMS logs to provide device intelligence. Concepts introduced for modeling mobile data as sequences include compressing context logs to intervaled context events, adding generalized time features, and identifying meaningful sequences via filter expressions

    A customizable multi-agent system for distributed data mining

    Get PDF
    We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids
    • …
    corecore