24,350 research outputs found

    Spatial clustering method for geographic data

    Get PDF
    In the process of visualizing quantitative spatial data, it is necessary to classify attribute values into some class divisions. In a previous paper, the author proposed a classification method for minimizing the loss of information contained in original data. This method can be considered as a kind of smoothing method that neglects the characteristics of spatial distribution. In order to understand the spatial structure of data, it is also necessary to construct another smoothing method considering the characteristics of the distribution of the spatial data. In this paper, a spatial clustering method based on Akaike’s Information Criterion is proposed. Furthermore, numerical examples of its application are shown using actual spatial data for the Tokyo Metropolitan area

    Testing for Localisation Using Micro-Geographic Data

    Get PDF
    To study the detailed location patterns of industries, and particularly the tendency for industries to cluster relative to overall manufacturing, we develop distance-based tests of localisation. In contrast to previous studies, our approach allows us to assess the statistical significance of departures from randomness. In addition, we treat space as continuous instead of using an arbitrary collection of geographical units. This avoids problems relating to scale and borders. We apply these tests to an exhaustive UK data set. For four-digit industries, we find that (i) only 51% of them are localised at a 5% confidence level, (ii) localisation takes place mostly at small scales below 50 kilometres, (iii) the degree of localisation is very skewed, and (iv) industries follow broad sectoral patterns with respect to localisation. Depending on the industry, smaller establishments can be the main drivers of both localisation and dispersion. Three-digit sectors show similar patterns of localisation at small scales as well as a tendency to localise at medium scales.Localisation, Clusters, K-density, Spatial Statistics.

    Testing for Localisation Using Micro-Geographic Data

    Get PDF
    To study the detailed location patterns of industries, and particularly the tendency for industries to cluster relative to overallmanufacturing, we develop distance-based tests of localisation. In contrast to previous studies, our approach allows us to assess the statistical significance of departures from randomness. In addition, we treat space as continuous instead of using an arbitrary collection of geographical units. This avoids problems relating to scale and borders. We apply these tests to an exhaustive UK data set. For four-digit industries, we find that (i) only 51% of them are localised at a 5% confidence level, (ii) localisation takes place mostly at small scales below 50 kilometres, (iii) the degree of localisation is very skewed, and (iv) industries follow broad sectoral patterns with respect to localisation. Depending on the industry, smaller establishments can be the main drivers of both localisation and dispersion. Three-digit sectors show similar patterns of localisation at small scales as well as a tendency to localise at medium scales.localisation, clusters, K-density, spatial statistics

    Defining Areas: Linking Geographic Data in New Zealand

    Get PDF
    This paper develops a match quality statistic to quantify the trade-off between 'specificity' and 'completeness' when aggregating one regional aggregation to another. We apply this statistic to calculate the degree of mismatch between various regional aggregations for New Zealand using 1991 and 2001 Census Data. A program to calculate mismatch statistics is included as an appendix, as a Stata(r) ado file.Match quality; Geographic Aggregation

    Openpaths: empowering personal geographic data.

    Get PDF
    OpenPaths, created by the New York Times Company R&D Lab, is a platform that demonstrates the collective value of personal data sovereignty. It was developed in response to public outrage regarding the location record generated by Apple iOS devices. OpenPaths participants store their encrypted geographic data online while maintaining ownership and programmatic control. Projects of many kinds, from mobility research to expressive artwork, petition individuals for access to their data. In the context of locative media practice, OpenPaths expands the notion of the tracing to address the components of an ethical implementation of crowd-sourced geographic systems in the age of "big data"

    Cluster Analysis Using Geographic Data

    Get PDF
    Many businesses suffer from losses after establishing their business due to a lack of proper research before deciding on a new establishment location. The method proposed in this paper can land on the best possible location for a new establishment by web scraping a target list of Grand Rapids neighbourhoods using beautifulsoup library, and passing this list to geocoder library, to retrieve a list of geographical coordinates. API calls are made to Foursquare API with each coordinate as parameter which returns a JSON output consisting all the venues around. After various stages of pre-processing such as data cleaning, normalization and feature engineering this data is fed to a clustering algorithm such as K-means clustering; an unsupervised learning technique which strives to choose its centroids to minimize the inertia in the given data. The number of centroids in K-means clustering is determined by utilizing the two methods namely, Silhouette and Elbow method. The best location is determined by scrutinizing the frequency of coffee shops, hence, the competition/demand of coffee shops in the area and suggest the best possible spot for a new coffee shop. Grand Rapids is chosen as the location for this project. Of course, just like any other business decision, opening a new coffee shop requires various other factors to be considered, such as the audience in that area or any schools around. Nevertheless, determining a location for the new establishment is the primary step that any individual would think of

    Using Biotic Interaction Networks for Prediction in Biodiversity and Emerging Diseases

    Get PDF
    Networks offer a powerful tool for understanding and visualizing inter-species interactions within an ecology. Previously considered examples, such as trophic networks, are just representations of experimentally observed direct interactions. However, species interactions are so rich and complex it is not feasible to directly observe more than a small fraction. In this paper, using data mining techniques, we show how potential interactions can be inferred from geographic data, rather than by direct observation. An important application area for such a methodology is that of emerging diseases, where, often, little is known about inter-species interactions, such as between vectors and reservoirs. Here, we show how using geographic data, biotic interaction networks that model statistical dependencies between species distributions can be used to infer and understand inter-species interactions. Furthermore, we show how such networks can be used to build prediction models. For example, for predicting the most important reservoirs of a disease, or the degree of disease risk associated with a geographical area. We illustrate the general methodology by considering an important emerging disease - Leishmaniasis. This data mining approach allows for the use of geographic data to construct inferential biotic interaction networks which can then be used to build prediction models with a wide range of applications in ecology, biodiversity and emerging diseases

    Geographic Data Mining and Knowledge Discovery

    Get PDF
    Geographic data are information associated with a location on the surface of the Earth. They comprise spatial attributes (latitude, longitude, and altitude) and non-spatial attributes (facts related to a location). Traditionally, Physical Geography datasets were considered to be more valuable, thus attracted most research interest. But with the advancements in remote sensing technologies and widespread use of GPS enabled cellphones and IoT (Internet of Things) devices, recent years witnessed explosive growth in the amount of available Human Geography datasets. However, methods and tools that are capable of analyzing and modeling these datasets are very limited. This is because Human Geography data are inherently difficult to model due to its characteristics (non-stationarity, uneven distribution, etc.). Many algorithms were invented to solve these challenges -- especially non-stationarity -- in the past few years, like Geographically Weighted Regression, Multiscale GWR, Geographical Random Forest, etc. They were proven to be much more efficient than the general machine learning algorithms that are not specifically designed to deal with non-stationarity. However, such algorithms are far from perfect and have a lot of room for improvement. This dissertation proposed multiple algorithms for modeling non-stationary geographic data. The main contributions are: (1) designed a novel method to evaluate non-stationarity and its impact on regression models; (2) proposed the Geographic R-Partition tree for modeling non-stationary data; (3) proposed the IDW-RF algorithm, which uses the advantages of Random Forests to deal with extremely unevenly distributed geographic datasets; (4) proposed the LVRF algorithm, which models geographic data using a latent variable based method. Experiments show that these algorithms are very efficient and outperform other state-of-the-art algorithms in certain scenarios
    • 

    corecore