3 research outputs found
Received Revised Accepted
Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the globally-best patterns can be computed by mining only the new data increment. To address monolithic datasets we introduce a technique by which these datasets can be partitioned and mined serially with minimal impact on the result quality. We present applications of our work in both the counter-terrorism and bioinformatics domains
Structure Discovery from Sequential Data
In this paper we describe I-Subdue, an extension to the Subdue graph-based data mining system. I-Subdue operates over sequentially received relational data to incrementally discover the most representative substructures. The ability to incrementally refine discoveries from serially acquired data is important for many applications, particularly as computer systems become more integrated into human lives as interactive assistants. This paper describes initial work to overcome the challenge of locally optimal substructures overshadowing those that are globally optimal. We conclude by providing an overview of additional challenges for sequential structure discovery