14 research outputs found

    Predicting GDP of Indonesia Using K-Nearest Neighbour Regression

    Get PDF
    The impact of the global recession in 1998 that originated from the recession in the US will affect the projected economies in Asia, including Indonesia, both direct and indirect nature. In this study, we predicted Indonesia's GDP in the event of the economic crisis that hit Indonesia starting in 1998. Instead of using the famous prediction algorithm as a neural network and linear regression. K-Nearest Neighbour is selected because it is easy and fast to use in the small dataset. We use a dataset from 1980-2002, consisting of rice prices, premium prices, GDP of Japanese country, American GDP, currency exchange rates, Indonesian government consumption, and the value of Indonesia's oil exports. For evaluation, we compare k-NN regression prediction result with prediction result using back propagation neural network and multiple linear regression algorithm. Result show, k-NN regression is able to predict Indonesia's GDP using small dataset better than the neural network, and multiple linear regression method

    A generalization of the Minkowski distance and a new definition of the ellipse

    Full text link
    In this paper, we generalize the Minkowski distance by defining a new distance function in n-dimensional space, and we show that this function determines also a metric family as the Minkowski distance. Then, we consider three special cases of this family, which generalize the taxicab, Euclidean and maximum metrics respectively, and finally we determine circles of them with their some properties in the real plane. While we determine some properties of circles of the generalized Minkowski distance, we also discover a new definition for the ellipse.Comment: 18 pages, 18 figure

    Road distance and travel time for an improved house price Kriging predictor

    Get PDF
    The paper designs an automated valuation model to predict the price of residential property in Coventry, United Kingdom, and achieves this by means of geostatistical Kriging, a popularly employed distance-based learning method. Unlike traditional applications of distance-based learning, this papers implements non-Euclidean distance metrics by approximating road distance, travel time and a linear combination of both, which this paper hypothesizes to be more related to house prices than straight-line (Euclidean) distance. Given that โ€“ to undertake Kriging โ€“ a valid variogram must be produced, this paper exploits the conforming properties of the Minkowski distance function to approximate a road distance and travel time metric. A least squares approach is put forth for variogram parameter selection and an ordinary Kriging predictor is implemented for interpolation. The predictor is then validated with 10-fold cross-validation and a spatially aware checkerboard hold out method against the almost exclusively employed, Euclidean metric. Given a comparison of results for each distance metric, this paper witnesses a goodness of fit (rยฒ) result of 0.6901 ยฑ 0.18 SD for real estate price prediction compared to the traditional (Euclidean) approach obtaining a suboptimal rยฒ value of 0.66 ยฑ 0.21 SD

    Distance metric choice can both reduce and induce collinearity in geographically weighted regression

    Get PDF
    This paper explores the impact of different distance metrics on collinearity in local regression models such as geographically weighted regression. Using a case study of house price data collected in Hร  Nแป™i, Vietnam, and by fully varying both power and rotation parameters to create different Minkowski distances, the analysis shows that local collinearity can be both negatively and positively affected by distance metric choice. The Minkowski distance that maximised collinearity in a geographically weighted regression was approximate to a Manhattan distance with (powerโ€‰=โ€‰0.70) with a rotation of 30ยฐ, and that which minimised collinearity was parameterised with powerโ€‰=โ€‰0.05 and a rotation of 70ยฐ. The results indicate that distance metric choice can provide a useful extra tuning component to address local collinearity issues in spatially varying coefficient modelling and that understanding the interaction of distance metric and collinearity can provide insight into the nature and structure of the data relationships. The discussion considers first, the exploration and selection of different distance metrics to minimise collinearity as an alternative to localised ridge regression, lasso and elastic net approaches. Second, it discusses the how distance metric choice could extend the methods that additionally optimise local model fit (lasso and elastic net) by selecting a distance metric that further helped minimise local collinearity. Third, it identifies the need to investigate the relationship between kernel bandwidth, distance metrics and collinearity as an area of further work

    Geographically weighted regression with parameter-specific distance metrics

    Get PDF
    Geographically weighted regression (GWR) is an important local technique to model spatially varying relationships. A single distance metric (Euclidean or non-Euclidean) is generally used to calibrate a standard GWR model. However, variations in spatial relationships within a GWR model might also vary in intensity with respect to location and direction. This assertion has led to extensions of the standard GWR model to mixed (or semiparametric)GWR and to flexible bandwidth GWR models. In this article, we present a strongly related extension in fitting a GWR model with parameter-specific distance metrics (PSDM GWR). As with mixed and flexible bandwidth GWR models, a back-fitting algorithm is used for the calibration of the PSDM GWR model. The value of this new GWR model is demonstrated using a London house price data set as a case study. The results indicate that the PSDM GWR model can clearly improve the model calibration in terms of both goodness of fit and prediction accuracy, in contrast to the model fits when only one metric is singly used. Moreover, the PSDM GWR model provides added value in understanding how a regression modelโ€™s relationships may vary at different spatial scales, according to the bandwidths and distance metrics selected. PSDM GWR deals with spatial heterogeneities in data relationships in a general way, although questions remain on its model diagnostics, distance metric specification, and computational efficiency, providing options for further research

    GWmodel

    Get PDF
    In GWmodel, we introduce techniques from a particular branch of spatial statistics,termed geographically-weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localised calibration provides a better description. GWmodel includes functions to calibrate: GW summary statistics, GW principal components analysis,GW discriminant analysis and various forms of GW regression; some of which are provided in basic and robust (outlier resistant) forms

    GWmodelS: a standalone software to train geographically weighted models

    Get PDF
    With the recent increase in studies on spatial heterogeneity, geographically weighted (GW) models have become an essential set of local techniques, attracting a wide range of users from different domains. In this study, we demonstrate a newly developed standalone GW software, GWmodelS using a community-level house price data set for Wuhan, China. In detail, a number of fundamental GW models are illustrated, including GW descriptive statistics, basic and multiscale GW regression, and GW principle component analysis. Additionally, functionality in spatial data management and batch mapping are presented as essential supplementary activities for GW modeling. The software provides significant advantages in terms of a user-friendly graphical user interface, operational efficiency, and accessibility, which facilitate its usage for users from a wide range of domains

    Examining The Role of Job by Geographically Weighted Poisson Regression in The Post-Migration Adaptation Process: The Case of Van

    Get PDF
    Migration is a process of social change that involves the geographical relocation of people from one settlement to another, either permanently or temporarily, in order to spend all or part of their future lives. Van province is among the provinces receiving migration due to its geopolitical location and level of development. Migrant individuals need to adapt to that society in order to normalise their relations with the resident population over time. Individuals are in constant contact with the society due to the work done after migration, so it is thought that the work done has an important effect on the adaptation process. In this study, the effect of work on the post-migration adjustment process of individuals migrating from the first and second degree border neighbouring provinces of Van province was analysed using Poisson and Geographically Weighted Poisson Regression methods. The aim of the study is to determine the relationship between the contribution of work in the post-migration adjustment process and independent variables and to analyse which of the models used for the analysis gives stronger results. In the study, a face-to-face survey was conducted with 440 individuals and it was observed that the Geographically Weighted Poisson Regression method gave stronger results according to AIC, AICc and R^2 values. In addition, the effect and significance of the relationship between the dependent variable and independent variables according to provinces and districts are visualised and given with maps

    ์ง€๋ฆฌ ๊ฐ€์ค‘ ํšŒ๊ท€๋ชจํ˜• ๋ฐ ์„€ํ”Œ๋ฆฌ ๊ฐ€๋ฒ• ์„ค๋ช…๋ชจํ˜•์— ์˜ํ•œ ์ง€์—ญ์นจ์ˆ˜ ์˜ํ–ฅ์š”์ธ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ํ™˜๊ฒฝ๋Œ€ํ•™์› ํ™˜๊ฒฝ์กฐ๊ฒฝํ•™๊ณผ, 2022. 8. ์†ก์˜๊ทผ.The landscape is considered as a key component of the ecosystem intervention. Human activities have significantly changed the surface characteristics, such as affected the circulation and flow of natural materials and energy, or weakened the rainwater collection, storage function and runoff drainage capacity of the watershed. These led to waterlogging disasters and increased the risk to the living environment. Therefore, landscape planners and decision-makers need to constantly improve and optimize the landscape pattern to maintain the ecosystem's dynamic balance and reduce waterlogging at the same time. Development of remote sensing technology makes it possible to study large-scale watershed units, meanwhile the experiments on such large-scale sites can be verified by theory. Existing research on verification of theories ignored important interactions within the landscape pattern because the traditional linear regression model (a subfield of supervised learning) such as Geographically Weighted Regression (GWR) could not analyze the relationship between independent variables while analyzing the relationship between independent variables and dependent variables. In recent years, development of interpretable machine learning models in the field of machine learning is making up for this shortcoming. Among them, Shapley Additive Explanations (SHAP) is a representative method which provides an interpretable machine learning model based on game theory. It can not only analyze the relationship between independent variables and dependent variables, but also take into account correlations between multiple independent variables, and produce importance ranking according to the contribution degree. Through our extensive and thorough verification and comparative analysis of the two methods, we first find that in the analysis results of GWR, the Shannon Diversity Index (SHDI, one representative landscape metric) is seriously underestimated, while in the results of SHAP, SHDI shows a great impact on waterlogging in any scale of watershed units. At the same time, according to the prediction result of Prediction Mean Squared Error (MSE), although the error value of GWR is small, SHAP is still far more accurate than GWR. Secondly, the water cycle process has characteristics of producing multi-scale geographical watersheds. In order to taking into account the dynamic balance of hydrology, conducting comparative analysis of multi-level watershed-scale units is necessary. Our results show that the use of finer-scale watersheds as the research scale is not necessarily suitable for waterlogging research. In this study, we find that analysis on waterlogging in the Seoul Capital Area (SCA) based on Large-scale watershed units (LSWU) is the most appropriate and accurate. Finally, it is naturally assumed that a threshold for landscape pattern characteristics exists. When the impact on waterlogging reaches this critical point, its role in promoting or alleviating waterlogging will change. Through estimating threshold values of landscape pattern characteristics, the purpose of waterlogging disaster mitigation can be achieved accurately and at a low cost. In summary, this study explores the new analysis method of interactions between landscape patterns and waterlogging, and provid a reference for methods and results of waterlogging control based on landscape ecology.๊ฒฝ๊ด€์€ ์ƒํƒœ๊ณ„ ๊ฐœ์ž…์˜ ํ•ต์‹ฌ ์š”์†Œ๋กœ ๊ผฝํžŒ๋‹ค. ์ธ๋ฅ˜์˜ ํ™œ๋™์€ ์ง€ํ‘œ๋ฉด์˜ ํŠน์ง•์„ ํฌ๊ฒŒ ๋ณ€ํ™”์‹œํ‚ค๊ณ  ์žˆ์œผ๋ฉฐ, ์ž์—ฐ ๋ฌผ์งˆ๊ณผ ์—๋„ˆ์ง€์˜ ์ˆœํ™˜๊ณผ ํ๋ฆ„์— ์˜ํ–ฅ์„ ์ฃผ์–ด ์œ ์—ญ์— ๋น—๋ฌผ์„ ๋ชจ์œผ๋Š” ๊ธฐ๋Šฅ๊ณผ ๊ฒฝ๋ฅ˜๋ฐฐ์ˆ˜์˜ ๋Šฅ๋ ฅ์„ ์•ฝํ™”์‹œ์ผœ ์นจ์ˆ˜ ์žฌํ•ด์˜ ๋ฐœ์ƒ์„ ์ดˆ๋ž˜ํ•˜๊ณ  ์ƒํ™œํ™˜๊ฒฝ์˜ ์œ„ํ—˜์„ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฒฝ๊ด€๊ณ„ํš๊ฐ€์™€ ์ •์ฑ…๊ฒฐ์ •์ž๋Š” ์ƒํƒœ๊ณ„์˜ ๋™์  ๊ท ํ˜•์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฒฝ๊ด€๊ตฌ์กฐ์˜ ์ตœ์ ํ™”๋ฅผ ๋Š์ž„์—†์ด ๊ฐœ์„ ํ•˜์—ฌ ์นจ์ˆ˜๋ฅผ ์™„ํ™”ํ•˜๋Š” ๋ชฉ์ ์„ ๋‹ฌ์„ฑํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ์›๊ฒฉ ํƒ์‚ฌ ๊ธฐ์ˆ ์˜ ๋ฐœ๋‹ฌ๋กœ ๋Œ€๊ทœ๋ชจ ์œ ์—ญ ๋‹จ์œ„ ์—ฐ๊ตฌ๊ฐ€ ๊ฐ€๋Šฅํ•ด์กŒ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋Œ€๊ทœ๋ชจ ํ˜„์žฅ์—์„œ์˜ ์‹คํ—˜์€ ์ด๋ก ์œผ๋กœ ๊ฒ€์ฆ๋  ์ˆ˜ ์žˆ ๋‹ค. ์ด๋ก  ๊ฒ€์ฆ์— ๋Œ€ํ•œ ๊ณผ๊ฑฐ์˜ ์—ฐ๊ตฌ๋Š” ์ง€๋ฆฌ ๊ฐ€์ค‘ ํšŒ๊ท€ ๋ชจ๋ธ(GWR)์™€ ๊ฐ™์€ ์ „ํ†ต์ ์ธ ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ(์ง€๋„ ํ•™์Šต)์€ ๋…๋ฆฝ๋ณ€์ˆ˜๊ณผ ์ข…์†๋ณ€์ˆ˜๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜๋ฉด์„œ ๋…๋ฆฝ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฝ๊ด€ ํŒจํ„ด ๋‚ด์˜ ์ƒํ˜ธ ์ž‘์šฉ์„ ๋ฌด์‹œํ–ˆ๋‹ค. ์ตœ๊ทผ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋ฐœ์ „์ด ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ณ  ์žˆ๋‹ค. ์ด ์ค‘ ์„€ํ”Œ๋ฆฌ ๊ฐ€๋ฒ• ์„ค๋ช…๋ชจํ˜•(SHAP)์€ ๊ฒŒ์ž„ ์ด๋ก ์— ๊ธฐ๋ฐ˜ํ•œ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ์˜ ๋Œ€ํ‘œ์ด๋‹ค. ๋…๋ฆฝ๋ณ€์ˆ˜์™€ ์ข…์†๋ณ€์ˆ˜์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์„ ๋ฟ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ๋…๋ฆฝ๋ณ€์ˆ˜์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•ด ๊ธฐ์—ฌ๋„์— ๋”ฐ๋ฅธ ์ค‘์š”๋„ ์ˆœ์œ„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์˜ ๊ฒ€์ฆ ๋ฐ ๋น„๊ต ๋ถ„์„์„ ํ†ตํ•ด GWR์˜ ๋ถ„์„ ๊ฒฐ๊ณผ์—์„œ ์„€๋„Œ ๋‹ค์–‘์„ฑ ์ง€์ˆ˜(SHDI)๊ฐ€ ์‹ฌ๊ฐํ•˜๊ฒŒ ๊ณผ์†Œํ‰๊ฐ€๋œ ๋ฐ˜๋ฉด, SHAP ๊ฒฐ๊ณผ์—์„œ SHDI๋Š” ๋ชจ๋“  ๊ทœ๋ชจ์˜ ์œ ์—ญ ๋‹จ์œ„์—์„œ ์นจ์ˆ˜์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์˜ˆ์ธก ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(MSE)์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด GWR์˜ ์˜ค์ฐจ ๊ฐ’์€ ์ž‘์ง€๋งŒ SHAP๊ฐ€ GWR๋ณด๋‹ค ํ›จ์”ฌ ์ •ํ™•ํ•˜๋‹ค. ๋‘˜์งธ, ๋ฌผ ์ˆœํ™˜ ๊ณผ์ •์€ ๋‹ค๋‹จ๊ณ„ ์ง€๋ฆฌ์  ์œ ์—ญ์„ ์ƒ์„ฑํ•˜๋Š” ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ ๋‹ค. ์ˆ˜๋ฌธํ•™์˜ ๋™์  ๊ท ํ˜•์„ ์‹คํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค๋‹จ๊ณ„ ์œ ์—ญ ๊ทœ๋ชจ ๋‹จ์œ„์˜ ๋น„๊ต ๋ถ„์„์ด ํ•„์š”ํ•˜๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ๋Š” ๋” ๋ฏธ์„ธํ•œ ์œ ์—ญ์„ ์—ฐ๊ตฌ ๊ทœ๋ชจ๋กœ์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋ฐ˜๋“œ์‹œ ์ˆ˜๋ฌธ ์—ฐ๊ตฌ์— ์ ํ•ฉํ•˜์ง€ ์•Š์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ์œ ์—ญ๋‹จ์œ„(LSWU)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ˆ˜๋„๊ถŒ(SCA)์˜ ์นจ์ˆ˜ ์—ฐ๊ตฌ๊ฐ€ ๊ฐ€์žฅ ์ ์ ˆํ•˜๊ณ  ์ •ํ™•ํ•˜๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฝ๊ด€ํŒจํ„ด ํŠน์ง•์€ ์ž„๊ณ„์น˜๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์นจ์ˆ˜์— ๋Œ€ํ•œ ์˜ํ–ฅ์ด ์ž„๊ณ„์ ์— ๋„๋‹ฌํ–ˆ์„ ๋•Œ, ์นจ์ˆ˜๋ฅผ ์ด‰์ง„ํ•˜๊ฑฐ๋‚˜ ์™„ํ™”ํ•˜๋Š” ์ž‘์šฉ์ด ๋ณ€ํ™”ํ•œ๋‹ค. ๊ฒฝ๊ด€ํŒจํ„ด ํŠน์ง•์˜ ์ž„๊ณ„์น˜๋ฅผ ํ†ตํ•ด ์ •ํ™•ํ•˜๊ณ  ์ €๋น„์šฉ์œผ๋กœ ์นจ์ˆ˜ ์žฌํ•ด๋ฅผ ์™„ํ™”ํ•˜๋Š” ๋ชฉ์ ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ฒฝ๊ด€ํŒจํ„ด์™€ ์นจ์ˆ˜๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ ๋ถ„์„๋ฐฉ๋ฒ•์— ๋Œ€ํ•˜์—ฌ ์ƒˆ๋กœ์šด ํƒ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์—ฌ ๊ฒฝ๊ด€์ƒํƒœํ•™์— ๊ธฐ์ดˆํ•œ ์นจ์ˆ˜ ์™„ํ™”๋ฐฉ๋ฒ•๊ณผ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ๊ณ ๋กœ ์ œ๊ณตํ•œ๋‹ค.Chapter 1. Introduction 01 Section 1.1 Urbanization and Human Intelligence 01 Section 1.2 Landscape and Landscape Ecology 02 Section 1.3 Land Use Land Cover and Landscape Pattern Metrics 03 Section 1.4 Natural Water Cycle and Urban Waterlogging 05 Section 1.5 Comparison with Previous Studies 06 Section 1.6 Workflow and Study Area 10 Chapter 2. Materials and Methods 14 Section 2.1 Land Use Land Cover and Landscape Pattern Metrics 14 Section 2.2 Waterlogging Degree of Watershed Units 26 Section 2.3 Geographically Weighted Regression (GWR) 31 Section 2.4 Shapley Additive Explanations (SHAP) 34 Section 2.5 Prediction Mean Squared Error (MSE) 35 Section 2.6 Piecewise Linear Model 36 Chapter 3. Results 37 Section 3.1 Geographically Weighted Regression (GWR) 38 Section 3.2 Shapley Additive Explanations (SHAP) 52 Section 3.3 Prediction Mean Squared Error (MSE) 69 Section 3.4 Piecewise Linear Model 69 Chapter 4. Discussion 76 Section 4.1 Selection of Data and Tools 76 Section 4.2 Supervised Learning and Interpretive Machine Learning 77 Section 4.3 Landscape Threshold and Hydrological Disaster 84 Section 4.4 Rational Use of Limited Land Resources 84 Section 4.5 Limitation and Future Direction 85 Chapter 5. Conclusion 86 Appendix 89 References 90 Abstract in Korean 9
    corecore