Bogotá : 13th International Summer School and Conference, 8. – 12. September 2025, Bogotá, (Colombia)
Abstract
Accurate demographic predictions at fine spatial resolution play a critical role in helping urban planners allocate resources efficiently for diverse populations. Traditional demographic data collection methods, such as censuses and surveys, suffer from limitations in frequency, spatial resolution, and cost, which make it difficult to capture dynamic demographic changes in urban areas.
This research proposes a solution to address this gap by leveraging machine learning (ML) models to enrich building data with predicted demographic characteristics at the building level. The study uses
Random Forest (RF) and EXtreme Gradient Boosting (XGBoost) as machine learning models for training the model in Stuttgart, Germany and testing in Dresden, Germany to examine the generalizability of the models across cities with different urban structures. The indicators for the training models were provided through feature engineering of various datasets, primarily using twodimensional building data derived from 3D CityGML data, 100-meter grid census data, and OpenStreetMap points of interest (POI). The results show Random Forest (RF) outperformed
XGBoost, being less affected by errors despite slightly different R² values. The population model performed very good (R² ≈ 0.75), while residents' age predictions were weaker due to disaggregating demographic data from grids to buildings without building-level reference data. The POI data shows a minor effect on age pattern predictions and not the population itself. This research highlights potential of integrating ML with urban data as an alternative to traditional demographic data collection but enhancing building data quality and feature engineering could improve accuracy and better assess of POI impact
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.