307 research outputs found

    A BELIEF-DRIVEN DISCOVERY FRAMEWORK BASED ON DATA MONITORING AND TRIGGERING

    Get PDF
    A new knowledge-discovery framework, called Data Monitoring and Discovery Triggering (DMDT), is defined, where the user specifies monitors that âwatch" for significant changes to the data and changes to the user-defined system of beliefs. Once these changes are detected, knowledge discovery processes, in the form of data mining queries, are triggered. The proposed framework is the result of an observation, made in the previous work of the authors, that when changes to the user-defined beliefs occur, this means that, there are interesting patterns in the data. In this paper, we present an approach for finding these interesting patterns using data monitoring and belief-driven discovery techniques. Our approach is especially useful in those applications where data changes rapidly with time, as in some of the On-Line Transaction Processing (OLTP) systems. The proposed approach integrates active databases, data mining queries and subjective measures of interestingness based on user-defined systems of beliefs in a novel and synergetic way to yield a new type of data mining systems.Information Systems Working Papers Serie

    Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach

    Get PDF
    An approach to defining actionability as a measure of interestingness of patterns is proposed. This approach is based on the concept of an action hierarchy which is defined as a tree of actions with patterns and pattern templates (data mining queries) assigned to its nodes. A method for discovering actionable patterns is presented and various techniques for optimizing the discovery process are proposed.Information Systems Working Papers Serie

    Interactive Constrained Association Rule Mining

    Full text link
    We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

    Resilient store: a heuristic-based data format selector for intermediate results

    Get PDF
    The final publication is available at link.springer.comLarge-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.Peer ReviewedPostprint (author's final draft

    Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA

    Get PDF
    Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of graphic card processor technology within the SAP NetWeaver business intelligence accelerator (BIA). The BIA represents a highly distributed analytical engine that supports OLAP and data mining processing primitives. The system organizes data entities in column-wise fashion and its operation is completely main-memory-based. Since case studies have shown that classic data mining queries spend a large portion of their runtime on scanning and filtering the data as a necessary prerequisite to the actual mining step, our main goal was to speed up this expensive scanning and filtering process. In a first step, the paper outlines the basic data mining processing techniques within SAP NetWeaver BIA and illustrates the implementation of scans and filters. In a second step, we give insight into the main features of a hybrid system architecture design exploiting graphic card processor technology. Finally, we sketch the implementation and give details of our vast evaluations

    A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

    Full text link
    The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses
    corecore