Search CORE

16,373 research outputs found

An Investigation in Efficient Spatial Patterns Mining

Author: Wang Lizhen
Publication venue
Publication date
Field of study

The technical progress in computerized spatial data acquisition and storage results in the growth of vast spatial databases. Faced with large amounts of increasing spatial data, a terminal user has more difficulty in understanding them without the helpful knowledge from spatial databases. Thus, spatial data mining has been brought under the umbrella of data mining and is attracting more attention. Spatial data mining presents challenges. Differing from usual data, spatial data includes not only positional data and attribute data, but also spatial relationships among spatial events. Further, the instances of spatial events are embedded in a continuous space and share a variety of spatial relationships, so the mining of spatial patterns demands new techniques. In this thesis, several contributions were made. Some new techniques were proposed, i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree), maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’ Generalization Sequences), and fuzzy association prediction. Three algorithms were put forward on co-location patterns mining: the fuzzy co-location mining algorithm, the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique- based maximal prevalence co-location mining algorithm (order-clique-based algorithm). An attribute-oriented induction algorithm based on attributes’ generalization sequences (AOI-ags algorithm) is further given, which unified the attribute thresholds and the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association prediction algorithm is designed. Also a cell-based spatial object fusion algorithm is proposed. Two fuzzy clustering methods using domain knowledge were proposed: Natural Method and Graph-Based Method, both of which were controlled by a threshold. The threshold was confirmed by polynomial regression. Finally, a prototype system on spatial co-location patterns’ mining was developed, and shows the relative efficiencies of the co-location techniques proposed The techniques presented in the thesis focus on improving the feasibility, usefulness, effectiveness, and scalability of related algorithm. In the design of fuzzy co-location Abstract mining algorithm, a new data structure, the binary partition tree, used to improve the process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to partition the prevalent event set search space into subsets, where each sub-problem can be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is guaranteed since it does not require expensive spatial joins or instance joins for identifying co-location table instances. In the order-clique-based algorithm, the co-location table instances do not need be stored after computing the Pi value of corresponding colocation, which dramatically reduces the executive time and space of mining maximal colocations. Some technologies, for example, partitions, equivalence partition trees, prune optimization strategies and interestingness, were used to improve the efficiency of the AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing window” and the proximity computation pruning were introduced to reduce both I/O and CPU costs in computing the fuzzy semantic proximity between time-series. For new techniques and algorithms, theoretical analysis and experimental results on synthetic data sets and real-world datasets were presented and discussed in the thesis

University of Huddersfield Repository

Attribute oriented induction with star schema

Author: H Spits Warnars H. L.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 29/05/2010
Field of study

This paper will propose a novel star schema attribute induction as a new attribute induction paradigm and as improving from current attribute oriented induction. A novel star schema attribute induction will be examined with current attribute oriented induction based on characteristic rule and using non rule based concept hierarchy by implementing both of approaches. In novel star schema attribute induction some improvements have been implemented like elimination threshold number as maximum tuples control for generalization result, there is no ANY as the most general concept, replacement the role concept hierarchy with concept tree, simplification for the generalization strategy steps and elimination attribute oriented induction algorithm. Novel star schema attribute induction is more powerful than the current attribute oriented induction since can produce small number final generalization tuples and there is no ANY in the results.Comment: 23 Pages, IJDM

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

Author: Freitas Alex A.
Publication venue: Morgan Kaufmann
Publication date: 01/01/1997
Field of study

This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

CiteSeerX

Kent Academic Repository

Towards Large-Scale Knowledge Discovery in Databases (KDD) by Exploiting Parallelism in Generic KDD Primitives

Author: Freitas Alex A.
Publication venue
Publication date: 01/07/1997
Field of study

Kent Academic Repository

Combining the Attribute Oriented Induction and Graph Visualization to Enhancement Association Rules Interpretation

Author: Abdullah Zahraa
Al-Mamory Safaa
Publication venue: University of Information and Technology Communications
Publication date: 31/12/2016
Field of study

The important methods of data mining is large and from these methods is mining of association rule. The miningof association rule gives huge number of the rules. These huge rules make analyst consuming more time when searchingthrough the large rules for finding the interesting rules. One of the solutions for this problem is combing between one of theAssociation rules visualization method and generalization method. Association rules visualization method is graph-basedmethod. Generalization method is Attribute Oriented Induction algorithm (AOI). AOI after combing calls ModifiedAOI because it removes and changes in the steps of the traditional AOI. The graph technique after combing also callsgrouped graph method because it displays the aggregated that results rules from AOI. The results of this paper are ratio ofcompression that gives clarity of visualization. These results provide the ability for test and drill down in the rules orunderstand and roll up

Iraqi Journal for Computers and Informatics

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato