307 research outputs found
A BELIEF-DRIVEN DISCOVERY FRAMEWORK BASED ON DATA MONITORING AND TRIGGERING
A new knowledge-discovery framework, called Data Monitoring and Discovery Triggering (DMDT),
is defined, where the user specifies monitors that âwatch" for significant changes to the data
and changes to the user-defined system of beliefs. Once these changes are detected, knowledge
discovery processes, in the form of data mining queries, are triggered. The proposed framework
is the result of an observation, made in the previous work of the authors, that when changes to
the user-defined beliefs occur, this means that, there are interesting patterns in the data. In this
paper, we present an approach for finding these interesting patterns using data monitoring and
belief-driven discovery techniques. Our approach is especially useful in those applications where
data changes rapidly with time, as in some of the On-Line Transaction Processing (OLTP) systems. The proposed approach integrates active databases, data mining queries and subjective
measures of interestingness based on user-defined systems of beliefs in a novel and synergetic
way to yield a new type of data mining systems.Information Systems Working Papers Serie
Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach
An approach to defining actionability as a measure of
interestingness of patterns is proposed. This approach
is based on the concept of an action hierarchy which
is defined as a tree of actions with patterns and pattern
templates (data mining queries) assigned to its
nodes. A method for discovering actionable patterns
is presented and various techniques for optimizing the
discovery process are proposed.Information Systems Working Papers Serie
Interactive Constrained Association Rule Mining
We investigate ways to support interactive mining sessions, in the setting of
association rule mining. In such sessions, users specify conditions (queries)
on the associations to be generated. Our approach is a combination of the
integration of querying conditions inside the mining phase, and the incremental
querying of already generated associations. We present several concrete
algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second
International Conference on Knowledge Discovery and Data Mining (DaWaK 2000
Resilient store: a heuristic-based data format selector for intermediate results
The final publication is available at link.springer.comLarge-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize
intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and
vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate
results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different
data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.Peer ReviewedPostprint (author's final draft
Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA
Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of graphic card processor technology within the SAP NetWeaver business intelligence accelerator (BIA). The BIA represents a highly distributed analytical engine that supports OLAP and data mining processing primitives. The system organizes data entities in column-wise fashion and its operation is completely main-memory-based. Since case studies have shown that classic data mining queries spend a large portion of their runtime on scanning and filtering the data as a necessary prerequisite to the actual mining step, our main goal was to speed up this expensive scanning and filtering process. In a first step, the paper outlines the basic data mining processing techniques within SAP NetWeaver BIA and illustrates the implementation of scans and filters. In a second step, we give insight into the main features of a hybrid system architecture design exploiting graphic card processor technology. Finally, we sketch the implementation and give details of our vast evaluations
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
- …