Search CORE

307 research outputs found

A BELIEF-DRIVEN DISCOVERY FRAMEWORK BASED ON DATA MONITORING AND TRIGGERING

Author: Silberschatz Avi
Tuzhilin Alexander
Publication venue: Stern School of Business, New York University
Publication date: 01/12/1996
Field of study

A new knowledge-discovery framework, called Data Monitoring and Discovery Triggering (DMDT), is defined, where the user specifies monitors that âwatch" for significant changes to the data and changes to the user-defined system of beliefs. Once these changes are detected, knowledge discovery processes, in the form of data mining queries, are triggered. The proposed framework is the result of an observation, made in the previous work of the authors, that when changes to the user-defined beliefs occur, this means that, there are interesting patterns in the data. In this paper, we present an approach for finding these interesting patterns using data monitoring and belief-driven discovery techniques. Our approach is especially useful in those applications where data changes rapidly with time, as in some of the On-Line Transaction Processing (OLTP) systems. The proposed approach integrates active databases, data mining queries and subjective measures of interestingness based on user-defined systems of beliefs in a novel and synergetic way to yield a new type of data mining systems.Information Systems Working Papers Serie

New York University Faculty Digital Archive

Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach

Author: Adomavicius Gediminas
Tuzhilin Alexander
Publication venue: Stern School of Business, New York University
Publication date: 01/01/1997
Field of study

An approach to defining actionability as a measure of interestingness of patterns is proposed. This approach is based on the concept of an action hierarchy which is defined as a tree of actions with patterns and pattern templates (data mining queries) assigned to its nodes. A method for discovering actionable patterns is presented and various techniques for optimizing the discovery process are proposed.Information Systems Working Papers Serie

New York University Faculty Digital Archive

Interactive Constrained Association Rule Mining

Author: Bussche Jan Van den
Goethals Bart
Publication venue
Publication date: 01/01/2003
Field of study

We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

arXiv.org e-Print Archive

CiteSeerX

Resilient store: a heuristic-based data format selector for intermediate results

Author: Abelló Gamazo Alberto
Bilalli Besim
Lehner Wolfgang
Munir Rana Faisal
Romero Moral Óscar
Thiele Maik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at link.springer.comLarge-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA

Author: Faerber Franz
Lehner Wolfgang
Mindnich Tobias
Weyerhaeuser Christoph
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/06/2022
Field of study

Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of graphic card processor technology within the SAP NetWeaver business intelligence accelerator (BIA). The BIA represents a highly distributed analytical engine that supports OLAP and data mining processing primitives. The system organizes data entities in column-wise fashion and its operation is completely main-memory-based. Since case studies have shown that classic data mining queries spend a large portion of their runtime on scanning and filtering the data as a necessary prerequisite to the actual mining step, our main goal was to speed up this expensive scanning and filtering process. In a first step, the paper outlines the basic data mining processing techniques within SAP NetWeaver BIA and illustrates the implementation of scans and filters. In a second step, we give insight into the main features of a hybrid system architecture design exploiting graphic card processor technology. Finally, we sketch the implementation and give details of our vast evaluations

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

Author: Alzeini H I
Habaebi M H
Hameed Sh A
Publication venue
Publication date: 01/12/2013
Field of study

The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses

arXiv.org e-Print Archive

Crossref

The International Islamic University Malaysia Repository