99,480 research outputs found
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
Using Answer Set Programming for pattern mining
Serial pattern mining consists in extracting the frequent sequential patterns
from a unique sequence of itemsets. This paper explores the ability of a
declarative language, such as Answer Set Programming (ASP), to solve this issue
efficiently. We propose several ASP implementations of the frequent sequential
pattern mining task: a non-incremental and an incremental resolution. The
results show that the incremental resolution is more efficient than the
non-incremental one, but both ASP programs are less efficient than dedicated
algorithms. Nonetheless, this approach can be seen as a first step toward a
generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014
Co-prescription patterns of cardiovascular preventive treatments: A cross-sectional study in the Aragon worker' health study (Spain)
Objectives: To identify cardiovascular disease (CVD) preventive treatments combinations, among them and with other drugs, and to determine their prevalence in a cohort of Spanish workers.
Design: Cross-sectional study. Setting Aragon Workers'' Health Study (AWHS) cohort in Spain. Participants 5577 workers belonging to AWHS cohort. From these subjects, we selected those that had, at least, three prescriptions of the same therapeutic subgroup in 2014 (n=4605). Primary and secondary outcome measures Drug consumption was obtained from the Aragon Pharmaceutical Consumption Registry (Farmasalud). In order to know treatment utilisation, prevalence analyses were conducted. Frequent item set mining techniques were applied to identify drugs co-prescription patterns. All the results were stratified by sex and age.
Results: 42.3% of men and 18.8% of women in the cohort received, at least, three prescriptions of a CVD preventive treatment in 2014. The most prescribed CVD treatment were antihypertensives (men: 28.2%, women 9.2%). The most frequent association observed among CVD preventive treatment was agents acting on the renin-angiotensin system and lipid-lowering drugs (5.1% of treated subjects). Co-prescription increased with age, especially after 50 years old, both in frequency and number of associations, and was higher in men. Regarding the association between CVD preventive treatments and other drugs, the most frequent pattern observed was lipid-lowering drugs and drugs used for acid related disorders (4.2% of treated subjects).
Conclusions: There is an important number of co-prescription patterns that involve CVD preventive treatments. These patterns increase with age and are more frequent in men. Mining techniques are a useful tool to identify pharmacological patterns that are not evident in the individual clinical practice, in order to improve drug prescription appropriateness
An efficient parallel method for mining frequent closed sequential patterns
Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739
Constraint-based Sequential Pattern Mining with Decision Diagrams
Constrained sequential pattern mining aims at identifying frequent patterns
on a sequential database of items while observing constraints defined over the
item attributes. We introduce novel techniques for constraint-based sequential
pattern mining that rely on a multi-valued decision diagram representation of
the database. Specifically, our representation can accommodate multiple item
attributes and various constraint types, including a number of non-monotone
constraints. To evaluate the applicability of our approach, we develop an
MDD-based prefix-projection algorithm and compare its performance against a
typical generate-and-check variant, as well as a state-of-the-art
constraint-based sequential pattern mining algorithm. Results show that our
approach is competitive with or superior to these other methods in terms of
scalability and efficiency.Comment: AAAI201
- …