14 research outputs found

    A Spatial Semi-supervised Learning Method for Mining Multi-spectral Remote Sensing Imagery

    No full text
    Supervised learning, which is often used in land cover (thematic) classification of remote sensing imagery, has two limitations: first these techniques require large amounts of accurate training data to accurately estimate underlying statistical model parameters and secondly, the independent and identically distributed (i.i.d) assumptions made by these techniques do not hold true in the case of high-resolution satellite images. Recently, semi-supervised learning techniques that utilize large unlabeled training samples in conjunction with small labeled training data are becoming popular in machine learning, especially in text data mining. These techniques provide a viable solution to small training dataset problems; however, the techniques do not exploit spatial context. In this paper we explore methods that utilize unlabeled samples in supervised learning for classification of multi-spectral remote sensing imagery, while also taking into account the spatial context in the learning process. We extended the classical Expectation-Maximization (EM) technique to model spatial context via Markov Random Fields (MRF). We have conducted several experiments on real data sets and our classification procedure shows an improvement of 10% in overall classification accuracy. Further studies are necessary to assess the true potential and usefulness of this technique in varying geographic settings. Keywords: MAP, MLE, EM, Spatial Context, Auto-correlation, MRF, semi-supervised learning, mixture model

    GX-Means: A model-based divide and merge algorithm for geospatial image clustering

    Get PDF
    AbstractOne of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally effcient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm

    A Comparative Study on Web Prefetching

    No full text
    The growth of the World Wide Web has emphasized the need for improved user latency. Increasing use of dynamic pages, frequent changes in the site structure, and user access patterns on the internet have limited the efficacy of caching techniques and emphasized the need for prefetching. Since prefecthing increses bandwidth, it is important that the prediction model is highly accurate and computationally feasible. It has been observed that in a web environment, certain sets of pages exhibit stronger correlations than others, a fact which can be used to predict future requests. Previous studies on predictive models are mainly based on pair interactions of pages and TOP-N approaches. In this paper we study a model based on page interactions of higher order where we exploit set relationships among the pages of a web site. We also compare the performance of this approach with the models based on pairwise interaction and the TOP-N approach. We have conducted a comparative study of these models on a real server log and five synthetic logs with varying page frequency distributions to simulate different real life web sites and identified dominance zones for each of these models. We find that the model based on higher order page interaction is more robust and gives competitive performance in a variety of situations

    Spatial Contextual Classification and Prediction Models for Mining Geospatial Data

    No full text
    Modeling spatial context (e.g., autocorrelation) is a key challenge in classification problems that arise in geospatial domains. Markov random fields (MRF) is a popular model for incorporating spatial context into image segmentation and land-use classification problems. The spatial autoregression (SAR) model, which is an extension of the classical regression model for incorporating spatial dependence, is popular for prediction and classification of spatial data in regional economics, natural resources, and ecological studies. There is little literature comparing these alternative approaches to facilitate the exchange of ideas (e.g., solution procedures). We argue that the SAR model makes more restrictive assumptions about the distribution of feature values and class boundaries than MRF. The relationship between SAR and MRF is analogous to the relationship between regression andBayesian classifiers. This paper provides comparisons between the two models using a probabilistic and an experimental framework

    Comparing exact and approximate spatial auto-regression model solutions for spatial data analysis

    No full text
    Abstract. The spatial auto-regression (SAR) model is a popular spatial data analysis technique, which has been used in many applications with geo-spatial datasets. However, exact solutions for estimating SAR parameters are computationally expensive due to the need to compute all the eigenvalues of a very large matrix. Recently we developed a dense-exact parallel formulation of the SAR parameter estimation procedure using data parallelism and a hybrid programming technique. Though this parallel implementation showed scalability up to eight processors, the exact solution still suffers from high computational complexity and memory requirements. These limitations have led us to investigate approximate solutions for SAR model parameter estimation with the main objective of scaling the SAR model for large spatial data analysis problems. In this paper we present two candidate approximate-semi-sparse solutions of the SAR model based on Taylor series expansion and Chebyshev polynomials. Our initial experiments showed that these new techniques scale well for very large data sets, such as remote sensing images having millions of pixels. The results also show that the differences between exact and approximate SAR parameter estimates are within 0.7 % and 8.2 % for Chebyshev polynomials and Taylor series expansion, respectively, and have no significant effect on the prediction accuracy.

    Open-Source GIS

    No full text
    The chapter explains the components of which an Open Source GIS is built of. They comprise the core software-component (mapserver), open source geospatial libraries, a typical open source GIS (Quantum GIS), the presently most widely spread open source database (PostgreSql) including its geospatial extension (PostGIS), and an overview over the most important license models. A mapserver can broadly be defined as a software platform for dynamically generating spatially referenced digital map products. The University of Minnesota MapServer or UMN MapServer, or simply MapServer, is one such system. Its basic features are visualization, overlay, and query. The mapserver architecture consists of a client, a server, and a database. The server is split up in three layers, the CGI-layer tying in to the network hardware, the geospatial analysis system, and the communication layer. Client and server do a load balancing for an optimal performance. The architecture is built upon the standards of the Open Geospatial Consortium of which those regarding interoperability are most important. The section concludes with a number of examples. The following section names and explains many of the geospatial open source libraries, starting with GDAL (raster) and OGR (vector). The other libraries are FDO (Feature Data Objects, JTS Topology Suite (JTS), GEOS, JCS Conflation Suite (JCS), MetaCRS, and GPSBabel. The application examples include derived GIS-software and data format conversions. The following section provides a detailed explanation of Quantum GIS, its origin and its applications. The features include a rich GUI, attribute tables, vector symbols, labeling, editing functions, projections, georeferencing, GPS support, analysis, and Web Map Server functionality. The architecture of Quantum GIS comprises a hierarchical set of several layers that ranges from data access via analysis to application. Future developments will address mobile applications, 3-D, and multithreading. The next section is dedicated to the database part. The origins of PostgreSQL are outlined and PostGIS discussed in detail. It extends PostgreSQL by implementing the Simple Feature standard. This allows applying a rich set of geospatial functions such as geometry types, e.g. polygons, relationships, e.g. within, and analysis function, e.g. convex hull. The last part of the chapter explains the most important open source licenses such as the GNU General Public License (GPL), the GNU Lesser General Public License (LGPL), the MIT license, and the BSD license, as well as the role of the Creative Commons

    Knowledge Discovery from Sensor Data (SensorKDD)

    Get PDF
    nash.edu.au Wide-area sensor infrastructures, remote sensors, RFIDs, and wireless sensor networks yield massive volumes of disparate, dynamic, and geographically distributed data. As such sensors are becoming ubiquitous, a set of broad requirements is beginning to emerge across high-priority applications including adaptability to climate change, electric grid monitoring, disaster preparedness and management, national or homeland security, and the management of critical infrastructures. The raw data from sensors need to be efficiently managed and transformed to usable information through data fusion, which in turn must be converted to predictive insights via knowledge discovery, ultimately facilitating automated or human-induced tactical decisions or strategic policy based on decision sciences and decision support systems. Keeping in view the requirements of the emerging field of knowledge discovery from sensor data, we took initiative to develop a community of researchers with common interests and scientific goals, which culminated into the organization of SensorKDD series of workshops in conjunction with th
    corecore