274 research outputs found

    Squential Step Towards Pattern Warehousing

    Get PDF
    With the massive increase in the data, the demand by the analysts hyped for the proper repositories where they could analyse the concerned specific data patterns in order to make smart and quick decisions for the welfare and benefit of the business, organization or some social work. Pattern warehouse proved to be the best solution. This paper focuses on the discussion of existing architecture and moreover on the algorithms that is needed for retrieving the optimal patterns from the pattern warehouse. It also includes the detailed study about the sequential emergence of association rule algorithms which initially derive out patterns and later on those patterns are being optimized according to the interest of the analyst

    Mining subjectively interesting patterns in rich data

    Get PDF

    Pattern Mining and Sense-Making Support for Enhancing the User Experience

    Get PDF
    While data mining techniques such as frequent itemset and sequence mining are well established as powerful pattern discovery tools in domains from science, medicine to business, a detriment is the lack of support for interactive exploration of high numbers of patterns generated with diverse parameter settings and the relationships among the mined patterns. To enhance the user experience, real-time query turnaround times and improved support for interactive mining are desired. There is also an increasing interest in applying data mining solutions for mobile data. Patterns mined over mobile data may enable context-aware applications ranging from automating frequently repeated tasks to providing personalized recommendations. Overall, this dissertation addresses three problems that limit the utility of data mining, namely, (a.) lack of interactive exploration tools for mined patterns, (b.) insufficient support for mining localized patterns, and (c.) high computational mining requirements prohibiting mining of patterns on smaller compute units such as a smartphone. This dissertation develops interactive frameworks for the guided exploration of mined patterns and their relationships. Contributions include the PARAS pre- processing and indexing framework; enabling analysts to gain key insights into rule relationships in a parameter space view due to the compact storage of rules that enables query-time reconstruction of complete rulesets. Contributions also include the visual rule exploration framework FIRE that presents an interactive dual view of the parameter space and the rule space, that together enable enhanced sense-making of rule relationships. This dissertation also supports the online mining of localized association rules computed on data subsets by selectively deploying alternative execution strategies that leverage multidimensional itemset-based data partitioning index. Finally, we designed OLAPH, an on-device context-aware service that learns phone usage patterns over mobile context data such as app usage, location, call and SMS logs to provide device intelligence. Concepts introduced for modeling mobile data as sequences include compressing context logs to intervaled context events, adding generalized time features, and identifying meaningful sequences via filter expressions

    Doctor of Philosophy

    Get PDF
    dissertationWith the growing national dissemination of the electronic health record (EHR), there are expectations that the public will benefit from biomedical research and discovery enabled by electronic health data. Clinical data are needed for many diseases and conditions to meet the demands of rapidly advancing genomic and proteomic research. Many biomedical research advancements require rapid access to clinical data as well as broad population coverage. A fundamental issue in the secondary use of clinical data for scientific research is the identification of study cohorts of individuals with a disease or medical condition of interest. The problem addressed in this work is the need for generalized, efficient methods to identify cohorts in the EHR for use in biomedical research. To approach this problem, an associative classification framework was designed with the goal of accurate and rapid identification of cases for biomedical research: (1) a set of exemplars for a given medical condition are presented to the framework, (2) a predictive rule set comprised of EHR attributes is generated by the framework, and (3) the rule set is applied to the EHR to identify additional patients that may have the specified condition. iv Based on this functionality, the approach was termed the ‘cohort amplification' framework. The development and evaluation of the cohort amplification framework are the subject of this dissertation. An overview of the framework design is presented. Improvements to some standard associative classification methods are described and validated. A qualitative evaluation of predictive rules to identify diabetes cases and a study of the accuracy of identification of asthma cases in the EHR using frameworkgenerated prediction rules are reported. The framework demonstrated accurate and reliable rules to identify diabetes and asthma cases in the EHR and contributed to methods for identification of biomedical research cohorts

    Graph BI & analytics: current state and future challenges

    Get PDF
    In an increasingly competitive market, making well-informed decisions requires the analysis of a wide range of heterogeneous, large and complex data. This paper focuses on the emerging field of graph warehousing. Graphs are widespread structures that yield a great expressive power. They are used for modeling highly complex and interconnected domains, and efficiently solving emerging big data application. This paper presents the current status and open challenges of graph BI and analytics, and motivates the need for new warehousing frameworks aware of the topological nature of graphs. We survey the topics of graph modeling, management, processing and analysis in graph warehouses. Then we conclude by discussing future research directions and positioning them within a unified architecture of a graph BI and analytics framework.Peer ReviewedPostprint (author's final draft

    Quality and interestingness of association rules derived from data mining of relational and semi-structured data

    Get PDF
    Deriving useful and interesting rules from a data mining system are essential and important tasks. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. As the data mining techniques are data-driven, it is beneficial to affirm the rules using a statistical approach. It is important to establish the ways in which the existing statistical measures and constraint parameters can be effectively utilized and the sequence of their usage.In this thesis, a systematic way to evaluate the association rules discovered from frequent, closed and maximal itemset mining algorithms; and frequent subtree mining algorithm including the rules based on induced, embedded and disconnected subtrees is presented. With reference to the frequent subtree mining, in addition a new direction is explored based on utilizing the DSM approach capable of preserving all information from tree-structured database in a flat data format, consequently enabling the direct application of a wider range of data mining analysis/techniques to tree-structured data. Implications of this approach were investigated and it was found that basing rules on disconnected subtrees, can be useful in terms of increasing the accuracy and the coverage rate of the rule set.A strategy that combines data mining and statistical measurement techniques such as sampling, redundancy and contradictive checks, correlation and regression analysis to evaluate the rules is developed. This framework is then applied to real-world datasets that represent diverse characteristics of data/items. Empirical results show that with a proper combination of data mining and statistical analysis, the proposed framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy rules. Moreover, the results reveal the important characteristics and differences between mining frequent, closed or maximal itemsets; and mining frequent subtree including the rules based on induced, embedded and disconnected subtrees; as well as the impact of confidence measure for the prediction and classification task
    • …
    corecore