Decision Tree Classification of Spatial Data Streams Using Peano Trees of classification

Abstract

Many organizations have large quantities of spatial data collected in various application areas, including remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, etc.  These data collections are growing rapidly and can therefore be considered as spatial data streams.  For data stream classification, time is a major issue.  However, these spatial data sets are too large to be classified effectively in a reasonable amount of time using existing methods.  In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree).  The Peano Count Tree is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques.  Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved.  We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive).  Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams

    Similar works