937 research outputs found

    Interactive Data Exploration with Smart Drill-Down

    Full text link
    We present {\em smart drill-down}, an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples. Each group of tuples is described by a {\em rule}. For instance, the rule (a,b,⋆,1000)(a, b, \star, 1000) tells us that there are a thousand tuples with value aa in the first column and bb in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are {\sc NP-Hard}, and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets on our experimental prototype to demonstrate the usefulness of smart drill-down and study the performance of our algorithms

    Data mining with the SAP NetWeaver BI accelerator

    Get PDF
    The new SAP NetWeaver Business Intelligence accelerator is an engine that supports online analytical processing. It performs aggregation in memory and in query runtime over large volumes of structured data. This paper first briefly describes the accelerator and its main architectural features, and cites test results that indicate its power. Then it describes in detail how the accelerator may be used for data mining. The accelerator can perform data mining in the same large repositories of data and using the same compact index structures that it uses for analytical processing. A first such implementation of data mining is described and the results of a performance evaluation are presented. Association rule mining in a distributed architecture was implemented with a variant of the BUC iceberg cubing algorithm. Test results suggest that useful online mining should be possible with wait times of less than 60 seconds on business data that has not been preprocessed

    Decision Tables: Scalable Classification Exploring RDBMS Capabilities

    Get PDF
    In this paper, we report our success in building efficient scalable classifiers in the form of decision tables by exploring capabilities of modern relational database management systems. In addition to high classification accuracy, the unique features of the approach include its high training speed, linear scalability, and simplicity in implementation. More importantly, the major computation required in the approach can be implemented using standard functions provided by the modern relational DBMS. This not only makes implementation of the classifier extremely easy, further performance improvement is also expected when better processing strategies for those computations are developed and implemented in RDBMS. The novel classification approach based on grouping and counting and its implementation on top of RDBMS is described. The result

    Mining multi-level association rules using data cubes and mining N-most interesting itemsets.

    Get PDF
    by Kwong, Wang-Wai Renfrew.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 102-105).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Data Mining Tasks --- p.1Chapter 1.1.1 --- Characterization --- p.2Chapter 1.1.2 --- Discrimination --- p.2Chapter 1.1.3 --- Classification --- p.2Chapter 1.1.4 --- Clustering --- p.3Chapter 1.1.5 --- Prediction --- p.3Chapter 1.1.6 --- Description --- p.3Chapter 1.1.7 --- Association Rule Mining --- p.4Chapter 1.2 --- Motivation --- p.4Chapter 1.2.1 --- Motivation for Mining Multi-level Association Rules Using Data Cubes --- p.4Chapter 1.2.2 --- Motivation for Mining N-most Interesting Itemsets --- p.8Chapter 1.3 --- Outline of the Thesis --- p.10Chapter 2 --- Survey on Previous Work --- p.11Chapter 2.1 --- Data Warehousing --- p.11Chapter 2.1.1 --- Data Cube --- p.12Chapter 2.2 --- Data Mining --- p.13Chapter 2.2.1 --- Association Rules --- p.14Chapter 2.2.2 --- Multi-level Association Rules --- p.15Chapter 2.2.3 --- Multi-Dimensional Association Rules Using Data Cubes --- p.16Chapter 2.2.4 --- Apriori Algorithm --- p.19Chapter 3 --- Mining Multi-level Association Rules Using Data Cubes --- p.22Chapter 3.1 --- Use of Multi-level Concept --- p.22Chapter 3.1.1 --- Multi-level Concept --- p.22Chapter 3.1.2 --- Criteria of Using Multi-level Concept --- p.23Chapter 3.1.3 --- Use of Multi-level Concept in Association Rules --- p.24Chapter 3.2 --- Use of Data Cube --- p.25Chapter 3.2.1 --- Data Cube --- p.25Chapter 3.2.2 --- Mining Multi-level Association Rules Using Data Cubes --- p.26Chapter 3.2.3 --- Definition --- p.28Chapter 3.3 --- Method for Mining Multi-level Association Rules Using Data Cubes --- p.31Chapter 3.3.1 --- Algorithm --- p.33Chapter 3.3.2 --- Example --- p.35Chapter 3.4 --- Experiment --- p.44Chapter 3.4.1 --- Simulation of Data Cube by Array --- p.44Chapter 3.4.2 --- Simulation of Data Cube by B+ Tree --- p.48Chapter 3.5 --- Discussion --- p.54Chapter 4 --- Mining the N-most Interesting Itemsets --- p.56Chapter 4.1 --- Mining the N-most Interesting Itemsets --- p.56Chapter 4.1.1 --- Criteria of Mining the N-most Interesting itemsets --- p.56Chapter 4.1.2 --- Definition --- p.58Chapter 4.1.3 --- Property --- p.59Chapter 4.2 --- Method for Mining N-most Interesting Itemsets --- p.60Chapter 4.2.1 --- Algorithm --- p.60Chapter 4.2.2 --- Example --- p.76Chapter 4.3 --- Experiment --- p.81Chapter 4.3.1 --- Synthetic Data --- p.81Chapter 4.3.2 --- Real Data --- p.85Chapter 4.4 --- Discussion --- p.98Chapter 5 --- Conclusion --- p.100Bibliography --- p.101Appendix --- p.106Chapter A --- Programs for Mining the N-most Interesting Itemset --- p.106Chapter A.1 --- Programs --- p.106Chapter A.2 --- Data Structures --- p.108Chapter A.3 --- Global Variables --- p.109Chapter A.4 --- Functions --- p.110Chapter A.5 --- Result Format --- p.113Chapter B --- Programs for Mining the Multi-level Association Rules Using Data Cube --- p.114Chapter B.1 --- Programs --- p.114Chapter B.2 --- Data Structure --- p.118Chapter B.3 --- Variables --- p.118Chapter B.4 --- Functions --- p.11

    OLEMAR: An Online Environment for Mining Association Rules in Multidimensional Data

    Get PDF
    Data warehouses and OLAP (online analytical processing) provide tools to explore and navigate through data cubes in order to extract interesting information under different perspectives and levels of granularity. Nevertheless, OLAP techniques do not allow the identification of relationships, groupings, or exceptions that could hold in a data cube. To that end, we propose to enrich OLAP techniques with data mining facilities to benefit from the capabilities they offer. In this chapter, we propose an online environment for mining association rules in data cubes. Our environment called OLEMAR (online environment for mining association rules), is designed to extract associations from multidimensional data. It allows the extraction of inter-dimensional association rules from data cubes according to a sum-based aggregate measure, a more general indicator than aggregate values provided by the traditional COUNT measure. In our approach, OLAP users are able to drive a mining process guided by a meta-rule, which meets their analysis objectives. In addition, the environment is based on a formalization, which exploits aggregate measures to revisit the definition of the support and the confidence of discovered rules. This formalization also helps evaluate the interestingness of association rules according to two additional quality measures: lift and loevinger. Furthermore, in order to focus on the discovered associations and validate them, we provide a visual representation based on the graphic semiology principles. Such a representation consists in a graphic encoding of frequent patterns and association rules in the same multidimensional space as the one associated with the mined data cube. We have developed our approach as a component in a general online analysis platform called Miningcubes according to an Apriori-like algorithm, which helps extract inter-dimensional association rules directly from materialized multidimensional structures of data. In order to illustrate the effectiveness and the efficiency of our proposal, we analyze a real-life case study about breast cancer data and conduct performance experimentation of the mining process

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
    • …
    corecore