248 research outputs found

    Land valuation using an innovative model combining machine learning and spatial context

    Get PDF
    Valuation predictions are used by buyers, sellers, regulators, and authorities to assess the fairness of the value being asked. Urbanization demands a modern and efficient land valuation system since the conventional approach is costly, slow, and relatively subjective towards locational factors. This necessitates the development of alternative methods that are faster, user-friendly, and digitally based. These approaches should use geographic information systems and strong analytical tools to produce reliable and accurate valuations. Location information in the form of spatial data is crucial because the price can vary significantly based on the neighborhood and context of where the parcel is located. In this thesis, a model has been proposed that combines machine learning and spatial context. It integrates raster information derived from remote sensing as well as vector information from geospatial analytics to predict land values, in the City of Springfield. These are used to investigate whether a joint model can improve the value estimation. The study also identifies the factors that are most influential in driving these models. A geodatabase was created by calculating proximity and accessibility to key locations as well as integrating socio-economic variables, and by adding statistics related to green space density and vegetation index utilizing Sentinel-2 -satellite data. The model has been trained using Greene County government data as truth appraisal land values through supervised machine learning models and the impact of each data type on price prediction was explored. Two types of modeling were conducted. Initially, only spatial context data were used to assess their predictive capability. Subsequently, socio-economic variables were added to the dataset to compare the performance of the models. The results showed that there was a slight difference in performance between the random forest and gradient boosting algorithm as well as using distance measures data derived from GIS and adding socioeconomic variables to them. Furthermore, spatial autocorrelation analysis was conducted to investigate how the distribution of similar attributes related to the location of the land affects its value. This analysis also aimed to identify the disparities that exist in terms of socio-economic structure and to measure their magnitude.Includes bibliographical references

    EPC Green Premium in Two Different European Climate Zones: A Comparative Study between Barcelona and Turin

    Get PDF
    Energy performance certificates (EPCs) are important tools aimed at improving buildingsโ€™ energy performance. They play a central role in the context of the Energy Performance of Buildings Directive (EPBD) which asks member states (MS) to take the necessary measures to establish a complete certification system. In this study, an application of the hedonic price method (HPM) assessing the effect of energy labels derived from the EPC on real estate market value is presented. The estimation methodology was applied to two European cities characterized by different climate conditions. The analysis was based on two datasets of listing prices referring to multi-family residential markets in Turin (Italy) and Barcelona (Spain). Four models for each dataset were applied to capture the marginal price of green attributes, but also to control for the spatial autocorrelation among values. The findings showed how the EPC has been applied in the two countries and how it has influenced the real estate market. Turinโ€™s buyers pay more attention to the EPC label, while in Barcelona, they value much more single characteristics, such as air conditioning and a swimming pool, considered popular attributes among contemporary buildings in this climate zone. From the results, it is possible to deduce that the implementation of the EPC schemes is still irregular in EU countries and must be strengthened through a standardized rating model

    ๋น„๋ชจ์ˆ˜ ๊ณต๊ฐ„๋ชจํ˜•๊ณผ ์•™์ƒ๋ธ” ํ•™์Šต์— ๊ธฐ์ดˆํ•œ ๋‹จ๋…์ฃผํƒ๊ฐ€๊ฒฉ ์ถ”์ •

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ง€๋ฆฌํ•™๊ณผ, 2015. 8. ๋ฐ•๊ธฐํ˜ธ.๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ์ถ”์ • ๋ชจํ˜•์€ ์ตœ๊ทผ ๋ถ€๋™์‚ฐ ์ž๋ฃŒ์˜ ๊ณต๊ฐœ ๋ฐ ๊ตฌ๋“ ๊ฐ€๋Šฅ์„ฑ ์ฆ๊ฐ€๋กœ ๊ณผ๊ฑฐ ๊ทธ ์–ด๋Š ๋•Œ๋ณด๋‹ค ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ์ฆ‰ ์ž์‚ฐ ํฌํŠธํด๋ฆฌ์˜ค์˜ ๊ตฌ์„ฑ, ๊ธˆ์œต๊ธฐ๊ด€์˜ ๋‹ด๋ณด๋ฌผ ๊ฐ€์น˜ ์ถ”์ •, ๋ถ€๋™์‚ฐ ๊ฐœ๋ฐœ์˜ ํƒ€๋‹น์„ฑ ํŒ๋‹จ ๋“ฑ ์—ฌ๋Ÿฌ ์—…๋ฌด์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ๊ณผ์„ธํ‰๊ฐ€๋Š” ํŒŒ๊ธ‰ํšจ๊ณผ๊ฐ€ ์ „ ๊ตญ๋ฏผ์—๊ฒŒ ๋ฏธ์น˜๋Š” ๋“ฑ ์ค‘์š”์„ฑ์ด ๋งค์šฐ ํฐ ๋ถ„์•ผ์˜ ํ•˜๋‚˜์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์šฐ๋ฆฌ๋‚˜๋ผ์˜ ๋Œ€ํ‘œ์  ๊ณผ์„ธ๊ฐ€์น˜์— ํ•ด๋‹น๋˜๋Š” ๊ณต์‹œ๊ฐ€๊ฒฉ์˜ ๊ฒฝ์šฐ ํ˜„์‹คํ™”์œจ์ด ๋‚ฎ๊ณ  ๊ฐ€๊ฒฉ ๊ท ํ˜•์„ฑ์ด ๋ฏธํกํ•˜๋‹ค๋Š” ์ง€์ ์€ ๊ณผ๊ฑฐ๋ถ€ํ„ฐ ๊พธ์ค€ํžˆ ์ œ๊ธฐ๋œ ๋ฌธ์ œ์ ์ด๋‹ค. ์ด๋Š” ์กฐ์„ธ์ €ํ•ญ ๋“ฑ ์ •์น˜์  ์š”์ธ์—์„œ๋„ ๊ทธ ์›์ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์ง€๋งŒ ์„ธ๊ธˆ ๋ถ€๊ณผ์˜ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๊ณผ์„ธํ‰๊ฐ€ ๊ณผ์ •, ๋ณด๋‹ค ๊ตฌ์ฒด์ ์œผ๋กœ ๊ฐ€๊ฒฉ์ถ”์ • ๋ชจํ˜•์ด ์ž˜๋ชป๋œ ๊ฒƒ์— ๊ธฐ์ธํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ณด๋‹ค ์ •ํ™•ํ•œ ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ์ถ”์ • ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ์ƒ‰์œผ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘๋˜์—ˆ๋‹ค. ๋˜ํ•œ ์ˆ˜๋ฆฌ ๋˜๋Š” ๊ณ„๋Ÿ‰์  ๋ชจํ˜•์„ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌํšŒ๊ณผํ•™์—ฐ๊ตฌ์—์„œ ์ง€๊ธˆ๊นŒ์ง€ ์„ค๋ช… ์ค‘์‹ฌ์˜ ๋ชจํ˜•(Explanatory Modeling)์ด ์ฃผ๋ฅ˜๋ฅผ ์ด๋ฃจ์—ˆ์œผ๋ฉฐ, ์„ค๋ช…๋ ฅ์ด ์ข‹์€ ๋ชจํ˜•์€ ์˜ˆ์ธก๋ ฅ ๋˜ํ•œ ์ข‹์„ ๊ฒƒ์œผ๋กœ ์•”๋ฌต์  ๊ฐ€์ •์„ ํ•˜์—ฌ ์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋‘ ๊ฐ€์ง€ ์„ฑ๋Šฅ์ด ํ•ญ์ƒ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋ฉฐ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ชจํ˜•์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๋“ฑ์„ ํฌ์ƒํ•˜๋”๋ผ๋„ ์‹ ๊ท  ๊ด€์ฐฐ์น˜์˜ ์˜ˆ์ธก๋ ฅ ํ–ฅ์ƒ์„ ๊ฐ•์กฐํ•˜๋Š”, ์˜ˆ์ธก ์ค‘์‹ฌ์˜ ๋ชจํ˜•(Predictive Modeling)์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ์„ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ์ „ํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋œ ๋ชจํ˜•์€ ๋Œ€๋ถ€๋ถ„ ๋ชจ์ˆ˜ ๋ชจํ˜•(Parametric Model)์œผ๋กœ์„œ ์„ค๋ช…๋ณ€์ˆ˜์˜ ๋…๋ฆฝ์„ฑ, ์ž๋ฃŒ์˜ ์ •๊ทœ์„ฑ, ๋ชจํ˜• ์„ค๊ณ„(Model Specification) ์˜ค๋ฅ˜์˜ ๋ถ€์žฌ ๋“ฑ ์—„๊ฒฉํ•œ ๊ฐ€์ •์ด ๋งŽ์•˜๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ€๊ฒฉํ•จ์ˆ˜๋ฅผ ๋ชจ์ˆ˜ ๋ฐ ์„ค๋ช…๋ณ€์ˆ˜์™€์˜ ์„ ํ˜•๊ฒฐํ•ฉ ํ˜•ํƒœ๋กœ ์ „์ œํ•˜๋Š” ๋“ฑ ์ž๋ฃŒ ํŠน์„ฑ์„ ์ง€๋‚˜์น˜๊ฒŒ ๋‹จ์ˆœํ™”ํ•˜๋Š” ๋‹จ์ ์ด ์žˆ์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋น„ํ˜„์‹ค์  ํ†ต๊ณ„์  ๊ฐ€์ •๊ณผ ์‚ฌ์ „์— ์„ค์ •๋œ ๊ฐ€๊ฒฉํ•จ์ˆ˜ ํ˜•ํƒœ๋ฅผ ๊ฐ•์ œํ•˜์ง€ ์•Š๋Š” ์˜ˆ์ธก ์ค‘์‹ฌ์˜ ๋ชจํ˜•๋“ค์ด ๊ธฐ๊ณ„ํ•™์Šต(Machine Learning) ๋ถ„์•ผ์—์„œ ๋‹ค์–‘ํ•˜๊ฒŒ ์ œ์‹œ๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ชจํ˜•๋“ค์€ ๊ทธ ํŠน์ง•์ƒ ๋Œ€๋ถ€๋ถ„ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•(Non-parametric Model)์— ํ•ด๋‹น๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ทธ๊ฐ„ ๋ชจ์ˆ˜ ๋ชจํ˜•์— ์ง‘์ค‘๋˜์—ˆ๋˜ ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ์ถ”์ • ๋ฐฉ๋ฒ•๋ก ์„ ์˜ˆ์ธก ์ค‘์‹ฌ์˜ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•์œผ๋กœ ํ™•๋Œ€ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์•„์šธ๋Ÿฌ ๋‹ค์–‘ํ•œ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜• ์ค‘ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ๊ฒƒ์œผ๋กœ ํŒ๋ช…๋œ ๋ชจํ˜• ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฐœ๋ณ„ ๋ชจํ˜•๋“ค์˜ ์ถ”์ •๊ฐ’์„ ์ ์ •ํ•˜๊ฒŒ ๊ฒฐํ•ฉํ•˜๋Š” ์•™์ƒ๋ธ” ํ•™์Šต(Ensemble Learning) ๊ฐœ๋…์„ ๊ฐ€๊ฒฉ๊ฒฐ์ • ๊ณผ์ •์— ๋„์ž…ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ด์™€ ๊ฐ™์€ ๋ชจํ˜•์˜ ์ •๊ตํ™” ์™ธ์— ์‚ฌ๋ก€์ง€์—ญ์— ๋Œ€ํ•ด ์ฃผํƒ๊ฐ€๊ฒฉ์„ ์ง์ ‘ ์ถ”์ •ํ•จ์œผ๋กœ์จ ๋ชจํ˜•์„ ํ†ตํ•ด ์‚ฐ์ •๋œ ๊ฐ€๊ฒฉ๊ณผ ์‹ค์ œ ๊ฑฐ๋ž˜๊ฐ€๊ฒฉ ๋ฐ ํ˜„ํ–‰ ์ฃผํƒ๊ณต์‹œ๊ฐ€๊ฒฉ๊ณผ์˜ ์ฐจ์ด์ ์„ ํŒŒ์•…ํ•˜๊ณ ์ž ํ•œ๋‹ค. 2011๋…„๋ถ€ํ„ฐ 2014๋…„ ์‚ฌ์ด์— ์‹ ๊ณ ๋œ ์‹ค๊ฑฐ๋ž˜๊ฐ€ ์ž๋ฃŒ๋ฅผ ํˆฌ์ž…์ž๋ฃŒ๋กœ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ ์‚ฌ๋ก€์ง€์—ญ์€ ๋Œ€๋„์‹œ, ์ค‘์†Œ๋„์‹œ ๋ฐ ๊ตฐ ์ง€์—ญ์„ ๋Œ€ํ‘œํ•  ์ˆ˜ ์žˆ๋„๋ก ์„œ์šธ์‹œ ๊ฐ•๋‚จ๊ตฌ, ์ „์ฃผ์‹œ ๋•์ง„๊ตฌ, ์ „๋ผ๋‚จ๋„ ํ•ด๋‚จ๊ตฐ์„ ์„ ์ •ํ•˜์˜€์œผ๋ฉฐ ์ฃผ์š” ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๊ธฐ๊ณ„ํ•™์Šต ๋ถ„์•ผ์—์„œ ์ œ์‹œ๋œ ์—ฌ๋Ÿฌ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜• ์ค‘ SVM(Support Vector Machine)์ด๋‚˜ MARS(Multivariate Adaptive Regression Splines) ๋“ฑ ์ตœ๊ทผ์— ๊ฐœ๋ฐœ๋œ ๋ชจํ˜•๋“ค์˜ ์„ฑ๋Šฅ์ด ๋น„๊ต์  ์šฐ์ˆ˜ํ•œ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚˜ ์ด๋Ÿฌํ•œ ๋ชจํ˜•๋“ค์˜ ํ™•๋Œ€ ์ ์šฉ์ด ํ•„์š”ํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค. ๋˜ํ•œ ์ง€์—ญ ์ธก๋ฉด์—์„œ ๊ฐ•๋‚จ๊ตฌ๋ณด๋‹ค๋Š” ๋•์ง„๊ตฌ๊ฐ€, ๋•์ง„๊ตฌ๋ณด๋‹ค๋Š” ํ•ด๋‚จ๊ตฐ์ด ๊ฐ€๊ฒฉ์ถ”์ •์˜ ์ •ํ™•์„ฑ์ด ๋–จ์–ด์กŒ๋Š”๋ฐ, ์ด๋Š” ๋†์ดŒ์ง€์—ญ์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ์ฃผํƒ์ง‘๋‹จ์˜ ์ด์งˆ์„ฑ์ด ๋†’์•„์ง€๊ธฐ ๋•Œ๋ฌธ์ธ ๊ฒƒ์œผ๋กœ ํ’€์ด๋œ๋‹ค. ๊ฐ€๊ฒฉ์ถ”์ • ๋ชจํ˜•์ด ํŠนํžˆ ์–ด๋– ํ•œ ๋ถ€๋ถ„์—์„œ ์ทจ์•ฝํ•œ์ง€ ํšจ์œจ์ ์œผ๋กœ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ํšŒ๊ท€ํŠธ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜(Regression Tree Algorithm)์— ๊ธฐ๋ฐ˜ํ•œ ๊ตญ์ง€์  ๋ชจํ˜•์„ฑ๋Šฅ ์ง„๋‹จ์„ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ, ํ† ์ง€ ๋ฉด์ (๋˜๋Š” ์ฃผํƒ ๊ทœ๋ชจ)์— ๋”ฐ๋ฅธ ์ž๋ฃŒ ์ธตํ™”๊ฐ€ ์„ ํ–‰๋œ ํ›„ ๋ณธ๊ฒฉ์ ์ธ ๋ชจํ˜• ๊ตฌ์ถ•์ด ์ด๋ฃจ์–ด์งˆ ๊ฒฝ์šฐ ๊ฐ€๊ฒฉ์ถ”์ •์˜ ์ •ํ™•์„ฑ์ด ๋†’์•„์งˆ ๊ฒƒ์œผ๋กœ ํŒŒ์•…๋˜์—ˆ๋‹ค. ํ•œํŽธ ๊ธฐ๊ณ„ํ•™์Šต ๋ถ„์•ผ์—์„œ ์ œ์‹œ๋œ ์ด๋Ÿฌํ•œ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•๋“ค์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์†์„ฑ์ •๋ณด๋งŒ ๊ณ ๋ คํ•  ๋ฟ, ๊ณต๊ฐ„์‚ฌ์ƒ์˜ ํŠน์ง•์ธ ๊ณต๊ฐ„์  ์ข…์†์„ฑ(Spatial Dependence)์„ ๋ฐ˜์˜ํ•˜๋Š”๋ฐ ๊ด€์‹ฌ์ด ์ ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•์— ๊ณต๊ฐ„์  ์ข…์†์„ฑ์„ ์ถ”๊ฐ€๋กœ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด SVM์˜ scale parameter๋ฅผ ๊ณต๊ฐ„์  ์ข…์†์„ฑ์ด ๋ฏธ์น˜๋Š” ์ง€๋ฆฌ์  ๋ฒ”์œ„๋กœ ํ•ด์„ํ•˜์—ฌ ๋ชจํ˜•์„ ์ •๊ตํ™”ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์—ฌ๋Ÿฌ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•์— ๋ชจ๋‘ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฃผ๋ณ€ ์ฃผํƒ๊ฐ€๊ฒฉ์˜ ํ‰๊ท ์ ์ธ ๊ฐ€๊ฒฉ์ˆ˜์ค€์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ณต๊ฐ„์ฐจ ๋ณ€์ˆ˜(Spatially Lagged Variable) WY๋ฅผ ๊ตฌ์„ฑํ•˜์—ฌ ๊ณต๊ฐ„์  ์ข…์†์„ฑ์„ ๋ชจํ˜•์˜ ํ•œ ์š”์†Œ๋กœ ๋ฐ˜์˜ํ•˜์˜€๋‹ค. ์ฃผํƒ์— ๋Œ€ํ•œ ์ตœ์ข… ์˜ˆ์ธก๊ฐ€๊ฒฉ์€ ๊ฐœ๋ณ„ ๋ชจํ˜•๋“ค ์ค‘ ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚œ ๋ชจํ˜•์˜ ์˜ˆ์ธก์น˜๋กœ ๊ฒฐ์ •ํ•˜๋Š” ๋Œ€์‹ , ๊ฐœ๋ณ„ ๋ชจํ˜•๋“ค์—์„œ ์‚ฐ์ถœ๋œ ์˜ˆ์ธก์น˜๋ฅผ ๊ฐ€์ค‘ํ‰๊ท ํ•˜๋Š” ์•™์ƒ๋ธ” ํ‰๊ท (Ensemble Averaging)์„ ์ ์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•˜์˜€๋‹ค. ์•™์ƒ๋ธ” ํ‰๊ท ์€ ํ•ด๋‚จ๊ตฐ๊ณผ ๊ฐ™์ด ๊ฐœ๋ณ„ ๋ชจํ˜•๋“ค์—์„œ ์‚ฐ์ถœ๋œ ์˜ˆ์ธก์น˜ ๊ฐ„์˜ ์ƒ๊ด€์„ฑ์ด ๋‚ฎ์€ ๊ฒฝ์šฐ ํƒ์›”ํ•œ ์„ฑ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์‹œํ•œ ์•™์ƒ๋ธ” ์˜ˆ์ธก๊ฐ€๊ฒฉ๊ณผ ์‹ค์ œ ๊ฑฐ๋ž˜๊ฐ€๊ฒฉ, ๊ทธ๋ฆฌ๊ณ  ํ˜„ํ–‰ ๊ณต์‹œ๊ฐ€๊ฒฉ์„ ๋น„๊ตํ•˜์˜€์œผ๋ฉฐ ์—ฌ๋Ÿฌ ์ธก๋ฉด์—์„œ ๊ณต์‹œ๊ฐ€๊ฒฉ๋ณด๋‹ค๋Š” ์•™์ƒ๋ธ” ์˜ˆ์ธก๊ฐ€๊ฒฉ์ด ์‹ค์ œ ๊ฑฐ๋ž˜๊ฐ€๊ฒฉ์„ ๋ณด๋‹ค ๊ฐ€๊น๊ฒŒ ๋ฐ˜์˜ํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ณต์‹œ๊ฐ€๊ฒฉ์˜ ํŠน์ง• ๋‚ด์ง€ ํ’ˆ์งˆ์€ ํ‘œ์ค€์ฃผํƒ์˜ ์„ ์ • ๋“ฑ ์ž๋ฃŒ์ˆ˜์ง‘ ๋‹จ๊ณ„, ์ดํ•ด๊ด€๊ณ„์ž ์˜๊ฒฌ์ฒญ์ทจ ๋‹จ๊ณ„ ๋“ฑ ์—ฌ๋Ÿฌ ์ ˆ์ฐจ์—์„œ ๋ฐœ์ƒํ•œ ์˜ค๋ฅ˜๊ฐ€ ์ง‘์ ๋œ ๊ฒƒ์ž„์„ ๊ฐ์•ˆํ•˜์—ฌ ํ•ด์„ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.์ œ 1 ์žฅ ์„œ ๋ก  1 ์ œ 1 ์ ˆ ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ๊ณผ ๋ชฉ์  1 ์ œ 2 ์ ˆ ์—ฐ๊ตฌ์˜ ๋ฒ”์œ„์™€ ๋ฐฉ๋ฒ• 6 ์ œ 3 ์ ˆ ์—ฐ๊ตฌ์˜ ๋‚ด์šฉ๊ณผ ๊ตฌ์„ฑ 8 ์ œ 2 ์žฅ ์ด๋ก ์  ๊ณ ์ฐฐ 11 ์ œ 1 ์ ˆ ํ—ค๋„๋‹‰ ๊ฐ€๊ฒฉ ๋ชจํ˜•๊ณผ ์ฃผ์š” ์ด์Šˆ 11 1. ๊ฐ€์น˜ํ‰๊ฐ€๊ธฐ๋ฒ• 11 2. ํ—ค๋„๋‹‰ ๋ชจํ˜• 14 3. ํ—ค๋„๋‹‰ ๋ชจํ˜•์˜ ์ฃผ์š” ์ด์Šˆ 19 ์ œ 2 ์ ˆ ํ—ค๋„๋‹‰ ๊ฐ€๊ฒฉ ํ•จ์ˆ˜์˜ ๋น„์„ ํ˜•์„ฑ 24 1. ์„ ํ˜• ๋ชจํ˜•๊ณผ ๋น„์„ ํ˜• ๋ชจํ˜• 24 2. ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ๊ณผ ์„ค๋ช…๋ณ€์ˆ˜ ๊ฐ„์˜ ๋น„์„ ํ˜•์„ฑ 28 3. ๋น„์„ ํ˜•์„ฑ์„ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•œ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜• 32 ์ œ 3 ์ ˆ ๋ชจํ˜• ์„ฑ๋Šฅ์˜ ์ง„๋‹จ ๊ธฐ์ค€ 35 ์ œ 4 ์ ˆ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•์˜ ์œ ํ˜• 40 1. ๋‹คํ•ญํšŒ๊ท€๋ชจํ˜• 41 2. ์ผ๋ฐ˜๊ฐ€์‚ฐ๋ชจํ˜• 42 3. ํŠธ๋ฆฌ๊ธฐ๋ฐ˜ ๋ชจํ˜• 44 4. MARS(Multivariate Adaptive Regression Splines) 48 5. SVM(Support Vector Machines) 50 ์ œ 3 ์žฅ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•์˜ ์ ์šฉ ๋ฐ ๋ชจํ˜• ์„ฑ๋Šฅ ์ง„๋‹จ 53 ์ œ 1 ์ ˆ ์‹ค๊ฑฐ๋ž˜๊ฐ€ ์ž๋ฃŒ์˜ ์ •์ œ 53 1. ์‚ฌ๋ก€์ง€์—ญ์˜ ์„ ์ • 53 2. ์ž๋ฃŒ์˜ ์„ฑ๊ฒฉ ๋ฐ ํ•œ๊ณ„ 55 3. ์ ์ • ์‹ค๊ฑฐ๋ž˜๊ฐ€ ์ž๋ฃŒ์˜ ์„ ๋ณ„ 57 4. ๊ธฐ์ดˆ ํ†ต๊ณ„๋Ÿ‰ 63 ์ œ 2 ์ ˆ ์„ ํ˜•ํšŒ๊ท€๋ชจํ˜•(OLS)์˜ ์ ์šฉ 68 ์ œ 3 ์ ˆ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜•์˜ ์ ์šฉ 73 1. ์ผ๋ฐ˜๊ฐ€์‚ฐ๋ชจํ˜•(GAM) 73 2. ๋žœ๋ค ํฌ๋ฆฌ์ŠคํŠธ(Random Forest) 79 3. ๋ถ€์ŠคํŒ…(Boosting) 81 4. MARS(Multivariate Adaptive Regression Splines) 84 5. SVM(Support Vector Machines) 90 ์ œ 4 ์ ˆ ๋ชจํ˜• ์„ฑ๋Šฅ์˜ ๋น„๊ต 93 1. ์ง€์—ญ ๊ฐ„ ๋ชจํ˜• ์„ฑ๋Šฅ์˜ ๋น„๊ต 93 2. ์ง€์—ญ ๋‚ด ๋ชจํ˜• ์„ฑ๋Šฅ์˜ ๊ตญ์ง€์  ๋น„๊ต(Local Approach) 98 ์ œ 4 ์žฅ ๊ณต๊ฐ„์  ์ข…์†์„ฑ์„ ๋ฐ˜์˜ํ•œ ๋น„๋ชจ์ˆ˜ ๋ชจํ˜• 105 ์ œ 1 ์ ˆ ๋ฒ ๋ฆฌ์˜ค๊ทธ๋žจ์„ ํ™œ์šฉํ•œ SVM ๋ชจํ˜•์˜ ์ ์šฉ 105 ์ œ 2 ์ ˆ ๊ณต๊ฐ„์ฐจ ๋ณ€์ˆ˜๋ฅผ ํ™œ์šฉํ•œ ๋ชจํ˜•์˜ ์ ์šฉ 112 1. ๊ณต๊ฐ„๊ฐ€์ค‘ํ–‰๋ ฌ์˜ ๊ตฌ์„ฑ 112 2. ๋ชจํ˜•์˜ ๊ฐœ์„  ์ •๋„ 116 ์ œ 5 ์žฅ ์•™์ƒ๋ธ” ํ•™์Šต์„ ํ™œ์šฉํ•œ ์ถ”์ •๊ฐ€๊ฒฉ์˜ ๊ฒฐ์ • 118 ์ œ 1 ์ ˆ ์•™์ƒ๋ธ” ํ‰๊ท (Ensemble Averaging)์˜ ์ ์šฉ 118 ์ œ 2 ์ ˆ ์•™์ƒ๋ธ” ํ‰๊ท ๊ฐ€๊ฒฉ์˜ ํ•ด์„ 123 ์ œ 3 ์ ˆ ๋‹จ๋…์ฃผํƒ ๊ณต์‹œ๊ฐ€๊ฒฉ๊ณผ์˜ ๋น„๊ต ๋ฐ ํ•จ์˜ 130 1. ๋‹จ๋…์ฃผํƒ๊ฐ€๊ฒฉ ๊ณต์‹œ์ œ๋„ 130 2. ๊ณต์‹œ๊ฐ€๊ฒฉ๊ณผ์˜ ๋น„๊ต 132 ์ œ 6 ์žฅ ๊ฒฐ ๋ก  141 ์ฐธ๊ณ ๋ฌธํ—Œ 145 Abstract 163Docto

    Can machine learning algorithms associated with text mining from internet data improve housing price prediction performance?

    Get PDF
    Housing frenzies in China have attracted widespread global attention over the past few years, but the key is how to more accurately forecast housing prices in order to establish an effective real estate policy. Based on the ubiquitousness and immediacy of Internet data, this research adopts a broader version of text mining to search for keywords in relation to housing prices and then evaluates the predictive abilities using machine learning algorithms. Our findings indicate that this new method, especially random forest, not only detects turning points, but also offers prediction ability that clearly outperforms traditional regression analysis. Overall, the prediction based on online search data through a machine learning mechanism helps us better understand the trends of house prices in China. First published online 10 June 202
    • โ€ฆ
    corecore