99,480 research outputs found

    Reductions for Frequency-Based Data Mining Problems

    Full text link
    Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

    Using Answer Set Programming for pattern mining

    Get PDF
    Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014

    Co-prescription patterns of cardiovascular preventive treatments: A cross-sectional study in the Aragon worker' health study (Spain)

    Get PDF
    Objectives: To identify cardiovascular disease (CVD) preventive treatments combinations, among them and with other drugs, and to determine their prevalence in a cohort of Spanish workers. Design: Cross-sectional study. Setting Aragon Workers'' Health Study (AWHS) cohort in Spain. Participants 5577 workers belonging to AWHS cohort. From these subjects, we selected those that had, at least, three prescriptions of the same therapeutic subgroup in 2014 (n=4605). Primary and secondary outcome measures Drug consumption was obtained from the Aragon Pharmaceutical Consumption Registry (Farmasalud). In order to know treatment utilisation, prevalence analyses were conducted. Frequent item set mining techniques were applied to identify drugs co-prescription patterns. All the results were stratified by sex and age. Results: 42.3% of men and 18.8% of women in the cohort received, at least, three prescriptions of a CVD preventive treatment in 2014. The most prescribed CVD treatment were antihypertensives (men: 28.2%, women 9.2%). The most frequent association observed among CVD preventive treatment was agents acting on the renin-angiotensin system and lipid-lowering drugs (5.1% of treated subjects). Co-prescription increased with age, especially after 50 years old, both in frequency and number of associations, and was higher in men. Regarding the association between CVD preventive treatments and other drugs, the most frequent pattern observed was lipid-lowering drugs and drugs used for acid related disorders (4.2% of treated subjects). Conclusions: There is an important number of co-prescription patterns that involve CVD preventive treatments. These patterns increase with age and are more frequent in men. Mining techniques are a useful tool to identify pharmacological patterns that are not evident in the individual clinical practice, in order to improve drug prescription appropriateness

    An efficient parallel method for mining frequent closed sequential patterns

    Get PDF
    Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739

    Constraint-based Sequential Pattern Mining with Decision Diagrams

    Full text link
    Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.Comment: AAAI201
    corecore