7 research outputs found

    Data Mining Models for Student Databases

    Get PDF

    Fast frequent pattern mining.

    Get PDF
    Yabo Xu.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 57-60).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Frequent Pattern Mining --- p.1Chapter 1.2 --- Biosequence Pattern Mining --- p.2Chapter 1.3 --- Organization of the Thesis --- p.4Chapter 2 --- PP-Mine: Fast Mining Frequent Patterns In-Memory --- p.5Chapter 2.1 --- Background --- p.5Chapter 2.2 --- The Overview --- p.6Chapter 2.3 --- PP-tree Representations and Its Construction --- p.7Chapter 2.4 --- PP-Mine --- p.8Chapter 2.5 --- Discussions --- p.14Chapter 2.6 --- Performance Study --- p.15Chapter 3 --- Fast Biosequence Patterns Mining --- p.20Chapter 3.1 --- Background --- p.21Chapter 3.1.1 --- Differences in Biosequences --- p.21Chapter 3.1.2 --- Mining Sequential Patterns --- p.22Chapter 3.1.3 --- Mining Long Patterns --- p.23Chapter 3.1.4 --- Related Works in Bioinformatics --- p.23Chapter 3.2 --- The Overview --- p.24Chapter 3.2.1 --- The Problem --- p.24Chapter 3.2.2 --- The Overview of Our Approach --- p.25Chapter 3.3 --- The Segment Phase --- p.26Chapter 3.3.1 --- Finding Frequent Segments --- p.26Chapter 3.3.2 --- The Index-based Querying --- p.27Chapter 3.3.3 --- The Compression-based Querying --- p.30Chapter 3.4 --- The Pattern Phase --- p.32Chapter 3.4.1 --- The Pruning Strategies --- p.34Chapter 3.4.2 --- The Querying Strategies --- p.37Chapter 3.5 --- Experiment --- p.40Chapter 3.5.1 --- Synthetic Data Sets --- p.40Chapter 3.5.2 --- Biological Data Sets --- p.46Chapter 4 --- Conclusion --- p.55Bibliography --- p.6

    Data mining and database systems: integrating conceptual clustering with a relational database management system.

    Get PDF
    Many clustering algorithms have been developed and improved over the years to cater for large scale data clustering. However, much of this work has been in developing numeric based algorithms that use efficient summarisations to scale to large data sets. There is a growing need for scalable categorical clustering algorithms as, although numeric based algorithms can be adapted to categorical data, they do not always produce good results. This thesis presents a categorical conceptual clustering algorithm that can scale to large data sets using appropriate data summarisations. Data mining is distinguished from machine learning by the use of larger data sets that are often stored in database management systems (DBMSs). Many clustering algorithms require data to be extracted from the DBMS and reformatted for input to the algorithm. This thesis presents an approach that integrates conceptual clustering with a DBMS. The presented approach makes the algorithm main memory independent and supports on-line data mining

    Data mining and database systems : integrating conceptual clustering with a relational database management system

    Get PDF
    Many clustering algorithms have been developed and improved over the years to cater for large scale data clustering. However, much of this work has been in developing numeric based algorithms that use efficient summarisations to scale to large data sets. There is a growing need for scalable categorical clustering algorithms as, although numeric based algorithms can be adapted to categorical data, they do not always produce good results. This thesis presents a categorical conceptual clustering algorithm that can scale to large data sets using appropriate data summarisations. Data mining is distinguished from machine learning by the use of larger data sets that are often stored in database management systems (DBMSs). Many clustering algorithms require data to be extracted from the DBMS and reformatted for input to the algorithm. This thesis presents an approach that integrates conceptual clustering with a DBMS. The presented approach makes the algorithm main memory independent and supports on-line data mining.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A Formal Concept Analysis Approach to Association Rule Mining: The QuICL Algorithms

    Get PDF
    Association rule mining (ARM) is the task of identifying meaningful implication rules exhibited in a data set. Most research has focused on extracting frequent item (FI) sets and thus fallen short of the overall ARM objective. The FI miners fail to identify the upper covers that are needed to generate a set of association rules whose size can be exploited by an end user. An alternative to FI mining can be found in formal concept analysis (FCA), a branch of applied mathematics. FCA derives a concept lattice whose concepts identify closed FI sets and connections identify the upper covers. However, most FCA algorithms construct a complete lattice and therefore include item sets that are not frequent. An iceberg lattice, on the other hand, is a concept lattice whose concepts contain only FI sets. Only three algorithms to construct an iceberg lattice were found in literature. Given that an iceberg concept lattice provides an analysis tool to succinctly identify association rules, this study investigated additional algorithms to construct an iceberg concept lattice. This report presents the development and analysis of the Quick Iceberg Concept Lattice (QuICL) algorithms. These algorithms provide incremental construction of an iceberg lattice. QuICL uses recursion instead of iteration to navigate the lattice and establish connections, thereby eliminating costly processing incurred by past algorithms. The QuICL algorithms were evaluated against leading FI miners and FCA construction algorithms using benchmarks cited in literature. Results demonstrate that QuICL provides performance on the order of FI miners yet additionally derive the upper covers. QuICL, when combined with known algorithms to extract a basis of association rules from a lattice, offer a best known ARM solution. Beyond this, the QuICL algorithms have proved to be very efficient, providing an order of magnitude gains over other incremental lattice construction algorithms. For example, on the Mushroom data set, QuICL completes in less than 3 seconds. Past algorithms exceed 200 seconds. On T10I4D100k, QuICL completes in less than 120 seconds. Past algorithms approach 10,000 seconds. QuICL is proved to be the best known all around incremental lattice construction algorithm. Runtime complexity is shown to be O(l d i) where l is the cardinality of the lattice, d is the average degree of the lattice, and i is a mean function on the frequent item extents
    corecore