8,547 research outputs found
An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming
The main advantage of Constraint Programming (CP) approaches for sequential
pattern mining (SPM) is their modularity, which includes the ability to add new
constraints (regular expressions, length restrictions, etc). The current best
CP approach for SPM uses a global constraint (module) that computes the
projected database and enforces the minimum frequency; it does this with a
filtering algorithm similar to the PrefixSpan method. However, the resulting
system is not as scalable as some of the most advanced mining systems like
Zaki's cSPADE. We show how, using techniques from both data mining and CP, one
can use a generic constraint solver and yet outperform existing specialized
systems. This is mainly due to two improvements in the module that computes the
projected frequencies: first, computing the projected database can be sped up
by pre-computing the positions at which an symbol can become unsupported by a
sequence, thereby avoiding to scan the full sequence each time; and second by
taking inspiration from the trailing used in CP solvers to devise a
backtracking-aware data structure that allows fast incremental storing and
restoring of the projected database. Detailed experiments show how this
approach outperforms existing CP as well as specialized systems for SPM, and
that the gain in efficiency translates directly into increased efficiency for
other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin
データマイニングにおけるユーザベリティ向上と応用に関する研究
制度:新 ; 文部省報告番号:甲2574号 ; 学位の種類:博士(工学) ; 授与年月日:2008/3/15 ; 早大学位記番号:新473
A Knowledge Discovery Framework for Learning Task Models from User Interactions in Intelligent Tutoring Systems
Domain experts should provide relevant domain knowledge to an Intelligent
Tutoring System (ITS) so that it can guide a learner during problemsolving
learning activities. However, for many ill-defined domains, the domain
knowledge is hard to define explicitly. In previous works, we showed how
sequential pattern mining can be used to extract a partial problem space from
logged user interactions, and how it can support tutoring services during
problem-solving exercises. This article describes an extension of this approach
to extract a problem space that is richer and more adapted for supporting
tutoring services. We combined sequential pattern mining with (1) dimensional
pattern mining (2) time intervals, (3) the automatic clustering of valued
actions and (4) closed sequences mining. Some tutoring services have been
implemented and an experiment has been conducted in a tutoring system.Comment: Proceedings of the 7th Mexican International Conference on Artificial
Intelligence (MICAI 2008), Springer, pp. 765-77
Implementation of an interactive pattern mining framework on electronic health record datasets
Large collections of electronic patient records contain a broad range of clinical information highly relevant for data analysis. However, they are maintained primarily for patient administration, and automated methods are required to extract valuable knowledge for predictive, preventive, personalized and participatory medicine. Sequential pattern mining is a fundamental task in data mining which can be used to find statistically relevant, non-trivial temporal dependencies of events such as disease comorbidities. This works objective is to use this mining technique to identify disease associations based on ICD-9-CM codes data of the entire Taiwanese population obtained from Taiwan’s National Health Insurance Research Database.
This thesis reports the development and implementation of the Disease Pattern Miner – a pattern mining framework in a medical domain. The framework was designed as a Web application which can be used to run several state-of-the-art sequence mining algorithms on electronic health records, collect and filter the results to reduce the number of patterns to a meaningful size, and visualize the disease associations as an interactive model in a specific population group. This may be crucial to discover new disease associations and offer novel insights to explain disease pathogenesis. A structured evaluation of the data and models are required before medical data-scientist may use this application as a tool for further research to get a better understanding of disease comorbidities
- …