181 research outputs found

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Forecasting in Mathematics

    Get PDF
    Mathematical probability and statistics are an attractive, thriving, and respectable part of mathematics. Some mathematicians and philosophers of science say they are the gateway to mathematics’ deepest mysteries. Moreover, mathematical statistics denotes an accumulation of mathematical discussions connected with efforts to most efficiently collect and use numerical data subject to random or deterministic variations. Currently, the concept of probability and mathematical statistics has become one of the fundamental notions of modern science and the philosophy of nature. This book is an illustration of the use of mathematics to solve specific problems in engineering, statistics, and science in general

    Estimation of hourly near surface air temperature across Israel using an ensemble model

    Get PDF
    Mapping of near-surface air temperature (Ta) at high spatio-temporal resolution is essential for unbiased assessment of human health exposure to temperature extremes, not least given the observed trend of urbanization and global climate change. Data constraints have led previous studies to focus merely on daily Ta metrics, rather than hourly ones, making them insufficient for intra-day assessment of health exposure. In this study, we present a three-stage machine learning-based ensemble model to estimate hourly Ta at a high spatial resolution of 1 × 1 km2, incorporating remotely sensed surface skin temperature (Ts) from geostationary satellites, reanalysis synoptic variables, and observations from weather stations, as well as auxiliary geospatial variables, which account for spatio-temporal variability of Ta. The Stage 1 model gap-fills hourly Ts at 4 × 4 km2 from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI), which are subsequently fed into the Stage 2 model to estimate hourly Ta at the same spatio-temporal resolution. The Stage 3 model downscales the residuals between estimated and measured Ta to a grid of 1 × 1 km2, taking into account additionally the monthly diurnal pattern of Ts derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) data. In each stage, the ensemble model synergizes estimates from the constituent base learners—random forest (RF) and extreme gradient boosting (XGBoost)—by applying a geographically weighted generalized additive model (GAM), which allows the weights of results from individual models to vary over space and time. Demonstrated for Israel for the period 2004–2017, the proposed ensemble model outperformed each of the two base learners. It also attained excellent five-fold cross-validated performance, with overall root mean square error (RMSE) of 0.8 and 0.9 °C, mean absolute error (MAE) of 0.6 and 0.7 °C, and R2 of 0.95 and 0.98 in Stage 1 and Stage 2, respectively. The Stage 3 model for downscaling Ta residuals to 1 km MODIS grids achieved overall RMSE of 0.3 °C, MAE of 0.5 °C, and R2 of 0.63. The generated hourly 1 × 1 km2 Ta thus serves as a foundation for monitoring and assessing human health exposure to temperature extremes at a larger geographical scale, helping to further minimize exposure misclassification in epidemiological studies

    Estimation of hourly near surface air temperature across Israel using an ensemble model

    Get PDF
    Mapping of near-surface air temperature (Ta) at high spatio-temporal resolution is essential for unbiased assessment of human health exposure to temperature extremes, not least given the observed trend of urbanization and global climate change. Data constraints have led previous studies to focus merely on daily Ta metrics, rather than hourly ones, making them insufficient for intra-day assessment of health exposure. In this study, we present a three-stage machine learning-based ensemble model to estimate hourly Ta at a high spatial resolution of 1 × 1 km2, incorporating remotely sensed surface skin temperature (Ts) from geostationary satellites, reanalysis synoptic variables, and observations from weather stations, as well as auxiliary geospatial variables, which account for spatio-temporal variability of Ta. The Stage 1 model gap-fills hourly Ts at 4 × 4 km2 from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI), which are subsequently fed into the Stage 2 model to estimate hourly Ta at the same spatio-temporal resolution. The Stage 3 model downscales the residuals between estimated and measured Ta to a grid of 1 × 1 km2, taking into account additionally the monthly diurnal pattern of Ts derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) data. In each stage, the ensemble model synergizes estimates from the constituent base learners—random forest (RF) and extreme gradient boosting (XGBoost)—by applying a geographically weighted generalized additive model (GAM), which allows the weights of results from individual models to vary over space and time. Demonstrated for Israel for the period 2004–2017, the proposed ensemble model outperformed each of the two base learners. It also attained excellent five-fold cross-validated performance, with overall root mean square error (RMSE) of 0.8 and 0.9 °C, mean absolute error (MAE) of 0.6 and 0.7 °C, and R2 of 0.95 and 0.98 in Stage 1 and Stage 2, respectively. The Stage 3 model for downscaling Ta residuals to 1 km MODIS grids achieved overall RMSE of 0.3 °C, MAE of 0.5 °C, and R2 of 0.63. The generated hourly 1 × 1 km2 Ta thus serves as a foundation for monitoring and assessing human health exposure to temperature extremes at a larger geographical scale, helping to further minimize exposure misclassification in epidemiological studies

    Air pollution exposure assessment in sparsely monitored settings; applying machine-learning methods with remote sensing data in South Africa.

    Get PDF
    Air pollution is one of the leading environmental risk factors to human health – Both short and long-term exposure to air pollution impact human health accounting for over 4 million deaths. Although the risk of exposure to air pollution has been quantified in different settings and countries of the world. The majority of these studies are from high-income countries with historical air pollutant measurement data and corresponding health outcomes data to conduct such epidemiological studies. Air pollution exposure levels in these high-income settings are lower than the exposure levels in low-income countries. The exposure level in sub-Saharan Africa (SSA) countries has continued to increase due to rapid industrialization and urbanization. In addition, the underlying susceptibility profile of SSA population is different from the profiles of the population in high-income settings. However, a major limitation to conducting epidemiological studies to quantify the exposure-response relationship between air pollution and adverse health outcomes in SSA is the paucity of historical air pollution measurement data to inform such epidemiological studies. South Africa an SSA country with some air quality monitoring stations especially in areas classified as air pollution priority areas have historical particulate matter less than or equal to 10 micrometres in aerodynamic diameter (PM10 μg/m3) measurement data. PM10 is one of the most monitored criteria for air pollutants in South Africa. The availability of satellite-derived aerosol optical depth (AOD) at high spatial and temporal resolutions provides information about how particles in the atmosphere can prevent sunlight from reaching the ground. This satellite product has been used as a proxy variable to explain ground-level air pollution levels in different settings. This thesis main objective was to use satellite-derived AOD to bridge the gap in ground-monitored PM10 across four provinces of South Africa (Gauteng, Mpumalanga, KwaZulu-Natal and Western Cape). We collected PM10 ground monitor measurement data from the South Africa Weather Services across the four provinces for the years 2010 – 2017. Due to the gaps in the daily PM10 across the sites and years. In study I, we compared methods for imputing daily ground-level PM10 data at sites across the four provinces for the years 2010 – 2017 using random forest (RF) models. The reliability of air pollution exposure models depends on how well the models capture the spatial and temporal variation of air pollution. Thus, study II explored the spatial and temporal variations in ground monitor PM10 across the four provinces for the years 2010 – 2017. To explore the feasibility of using satellite-derived AOD and other spatial and temporal predictor variables, Study III used an ensemble machine-learning framework of RF, extreme gradient boosting (XGBoost) and support vector regression (SVR) to calibrate daily ground-level PM10 at 1 × 1 km spatial resolution across the four provinces for the year 2016. In conclusion, we developed a spatiotemporal model to predict daily PM10 concentrations across four provinces of South Africa at 1 × 1 km spatial resolution for 2016. This model is the first attempt to use a satellite-derived product to fill the gap in ground monitor air pollution data in SSA

    A dynamic nonstationary spatio-temporal model for short term prediction of precipitation

    Full text link
    Precipitation is a complex physical process that varies in space and time. Predictions and interpolations at unobserved times and/or locations help to solve important problems in many areas. In this paper, we present a hierarchical Bayesian model for spatio-temporal data and apply it to obtain short term predictions of rainfall. The model incorporates physical knowledge about the underlying processes that determine rainfall, such as advection, diffusion and convection. It is based on a temporal autoregressive convolution with spatially colored and temporally white innovations. By linking the advection parameter of the convolution kernel to an external wind vector, the model is temporally nonstationary. Further, it allows for nonseparable and anisotropic covariance structures. With the help of the Voronoi tessellation, we construct a natural parametrization, that is, space as well as time resolution consistent, for data lying on irregular grid points. In the application, the statistical model combines forecasts of three other meteorological variables obtained from a numerical weather prediction model with past precipitation observations. The model is then used to predict three-hourly precipitation over 24 hours. It performs better than a separable, stationary and isotropic version, and it performs comparably to a deterministic numerical weather prediction model for precipitation and has the advantage that it quantifies prediction uncertainty.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS564 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Imputation, modelling and optimal sampling design for digital camera data in recreational fisheries monitoring

    Get PDF
    Digital camera monitoring has evolved as an active application-oriented scheme to help address questions in areas such as fisheries, ecology, computer vision, artificial intelligence, and criminology. In recreational fisheries research, digital camera monitoring has become a viable option for probability-based survey methods, and is also used for corroborative and validation purposes. In comparison to onsite surveys (e.g. boat ramp surveys), digital cameras provide a cost-effective method of monitoring boating activity and fishing effort, including night-time fishing activities. However, there are challenges in the use of digital camera monitoring that need to be resolved. Notably, missing data problems and the cost of data interpretation are among the most pertinent. This study provides relevant statistical support to address these challenges of digital camera monitoring of boating effort, to improve its utility to enhance recreational fisheries management in Western Australia and elsewhere, with capacity to extend to other areas of application. Digital cameras can provide continuous recordings of boating and other recreational fishing activities; however, interruptions of camera operations can lead to significant gaps within the data. To fill these gaps, some climatic and other temporal classification variables were considered as predictors of boating effort (defined as number of powerboat launches and retrievals). A generalized linear mixed effect model built on fully-conditional specification multiple imputation framework was considered to fill in the gaps in the camera dataset. Specifically, the zero-inflated Poisson model was found to satisfactorily impute plausible values for missing observations for varied durations of outages in the digital camera monitoring data of recreational boating effort. Additional modelling options were explored to guide both short- and long-term forecasting of boating activity and to support management decisions in monitoring recreational fisheries. Autoregressive conditional Poisson (ACP) and integer-valued autoregressive (INAR) models were identified as useful time series models for predicting short-term behaviour of such data. In Western Australia, digital camera monitoring data that coincide with 12-month state-wide boat-based surveys (now conducted on a triennial basis) have been read but the periods between the surveys have not been read. A Bayesian regression framework was applied to describe the temporal distribution of recreational boating effort using climatic and temporally classified variables to help construct data for such missing periods. This can potentially provide a useful cost-saving alternative of obtaining continuous time series data on boating effort. Finally, data from digital camera monitoring are often manually interpreted and the associated cost can be substantial, especially if multiple sites are involved. Empirical support for low-level monitoring schemes for digital camera has been provided. It was found that manual interpretation of camera footage for 40% of the days within a year can be deemed as an adequate level of sampling effort to obtain unbiased, precise and accurate estimates to meet broad management objectives. A well-balanced low-level monitoring scheme will ultimately reduce the cost of manual interpretation and produce unbiased estimates of recreational fishing indexes from digital camera surveys

    Missing data imputation of high-resolution temporal climate time series data

    Get PDF
    © 2020 The Authors. Meteorological Applications published by John Wiley & Sons Ltd on behalf of the Royal Meteorological Society. Analysis of high-resolution data offers greater opportunity to understand the nature of data variability, behaviours, trends and to detect small changes. Climate studies often require complete time series data which, in the presence of missing data, means imputation must be undertaken. Research on the imputation of high-resolution temporal climate time series data is still at an early phase. In this study, multiple approaches to the imputation of missing values were evaluated, including a structural time series model with Kalman smoothing, an autoregressive integrated moving average (ARIMA) model with Kalman smoothing and multiple linear regression. The methods were applied to complete subsets of data from 12 month time series of hourly temperature, humidity and wind speed data from four locations along the coast of Western Australia. Assuming that observations were missing at random, artificial gaps of missing observations were studied using a five-fold cross-validation methodology with the proportion of missing data set to 10%. The techniques were compared using the pooled mean absolute error, root mean square error and symmetric mean absolute percentage error. The multiple linear regression model was generally the best model based on the pooled performance indicators, followed by the ARIMA with Kalman smoothing. However, the low error values obtained from each of the approaches suggested that the models competed closely and imputed highly plausible values. To some extent, the performance of the models varied among locations. It can be concluded that the modelling approaches studied have demonstrated suitability in imputing missing data in hourly temperature, humidity and wind speed data and are therefore recommended for application in other fields where high-resolution data with missing values are common
    corecore