775 research outputs found

    Estimation of missing air pollutant data using a spatiotemporal convolutional autoencoder

    Get PDF
    A key challenge in building machine learning models for time series prediction is the incompleteness of the datasets. Missing data can arise for a variety of reasons, including sensor failure and network outages, resulting in datasets that can be missing significant periods of measurements. Models built using these datasets can therefore be biased. Although various methods have been proposed to handle missing data in many application areas, more air quality missing data prediction requires additional investigation. This study proposes an autoencoder model with spatiotemporal considerations to estimate missing values in air quality data. The model consists of one-dimensional convolution layers, making it flexible to cover spatial and temporal behaviours of air contaminants. This model exploits data from nearby stations to enhance predictions at the target station with missing data. This method does not require additional external features, such as weather and climate data. The results show that the proposed method effectively imputes missing data for discontinuous and long-interval interrupted datasets. Compared to univariate imputation techniques (most frequent, median and mean imputations), our model achieves up to 65% RMSE improvement and 20–40% against multivariate imputation techniques (decision tree, extra-trees, k-nearest neighbours and Bayesian ridge regressors). Imputation performance degrades when neighbouring stations are negatively correlated or weakly correlated

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Human spatial dynamics for electricity demand forecasting: the case of France during the 2022 energy crisis

    Full text link
    Accurate electricity demand forecasting is crucial to meet energy security and efficiency, especially when relying on intermittent renewable energy sources. Recently, massive savings have been observed in Europe, following an unprecedented global energy crisis. However, assessing the impact of such crisis and of government incentives on electricity consumption behaviour is challenging. Moreover, standard statistical models based on meteorological and calendar data have difficulty adapting to such brutal changes. Here, we show that mobility indices based on mobile network data significantly improve the performance of the state-of-the-art models in electricity demand forecasting during the sobriety period. We start by documenting the drop in the French electricity consumption during the winter of 2022-2023. We then show how our mobile network data captures work dynamics and how adding these mobility indices outperforms the state-of-the-art during this atypical period. Our results characterise the effect of work behaviours on the electricity demand

    Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities

    Full text link
    Traffic prediction plays a crucial role in alleviating traffic congestion which represents a critical problem globally, resulting in negative consequences such as lost hours of additional travel time and increased fuel consumption. Integrating emerging technologies into transportation systems provides opportunities for improving traffic prediction significantly and brings about new research problems. In order to lay the foundation for understanding the open research challenges in traffic prediction, this survey aims to provide a comprehensive overview of traffic prediction methodologies. Specifically, we focus on the recent advances and emerging research opportunities in Artificial Intelligence (AI)-based traffic prediction methods, due to their recent success and potential in traffic prediction, with an emphasis on multivariate traffic time series modeling. We first provide a list and explanation of the various data types and resources used in the literature. Next, the essential data preprocessing methods within the traffic prediction context are categorized, and the prediction methods and applications are subsequently summarized. Lastly, we present primary research challenges in traffic prediction and discuss some directions for future research.Comment: Published in Transportation Research Part C: Emerging Technologies (TR_C), Volume 145, 202

    Versatile Deep Learning Forecasting Application with Metamorphic Quality Assurance

    Get PDF
    Accurate estimates of fresh produce (FP) yields and prices are crucial for having fair bidding prices by retailers along with informed asking prices by farmers, leading to the best prices for customers. To have accurate estimates, the state-of-the-art deep learning (DL) models for forecasting FP yields and prices, including both station-based and satellite based models, are improved in this thesis by providing a new deep learning model structure. The scope of this work covers forecasting a horizon of 5 weeks ahead for the fresh produce yields and prices. The proposed structure is built using an ensemble of Attention Deep Feedforward Neural Network with Gated Recurrent Units (ADGRU) and Deep Feedforward Neural Network with embedded GRU units (DFNNGRU); (DFNNGRU-ADGRU ENS). The station-based version of the ensemble is trained and tested using as input the soil moisture and temperature parameters retrieved from land stations. This station-based ensemble model is found to outperform the literature model by 24% improvement in the AGM score for yield forecasting and 37.5% for price forecasting. For the satellite-based model, the best satellite image preprocessing technique must be found to represent the images with less data for efficiency. Therefore, a preprocessing approach based on averaging is proposed and implemented then compared with the literature approach, which is based on histograms, where the proposed approach improves performance by 20%. The proposed Deep Feed Forward Neural Network with Embedded Gated Recurrent Units (DFNNGRU) ensembled with Attention Deep GRUs (ADGRU) is then tested against well-performing models of Stacked-AutoEncoder (SAE) ensembled with Convolution Neural Networks with Long-short term memory (CNNLSTM), where the proposed model is found to outperform the literature model by 12.5%. In addition, interpolation techniques are used to estimate the missing VIs values due to the low frequency of capturing the satellite images by Landsat. A comparative analysis is conducted to choose the most effective technique, which is found to be Cubic Spline interpolation. The effect of adding the VIs as input parameters on the forecasting performance of the deep learning model is assessed and the most effective VIs are selected. One VI, which is the Normalized Difference Vegetation Index (NDVI), proves to be the most effective index in forecasting yield with an enhancement of 12.5% in AGM score. A novel transfer learning (TL) framework is proposed for better generalizability. After finding the best DL forecasting model, a TL framework is proposed to enhance that model generalization to other FPs by using FP similarity, clustering, and TL techniques customized to fit the problem in hand. Furthermore, the similarity algorithms found in literature are improved by considering the time series features rather than the absolute values of their points. In addition, the FPs are clustered using a hierarchical clustering technique utilizing the complete linkage of a dendrogram to automate the process of finding the similarity thresholds and avoid setting them arbitrarily. Finally, the transfer learning is applied by freezing some layers of the proposed ensemble model and fine-tuning the rest leading to significant improvement in AGM compared to the best literature model. Finally, a forecasting application is implemented to facilitate the use of the proposed models by the end users through a friendly interface. For testing the quality of the application deployed code and models, metamorphic testing is applied to assess the effectiveness of the machine learning models while machine learning is used to automatically detect the main metamorphic relations in the software code. The interactive role played by metamorphic testing and machine learning is investigated through the quality assurance of the forecasting application. The datasets used to train and test the deep learning forecasting models as well as the forecasting models are verified using metamorphic tests and the metamorphic relations in the generalization code are automatically detected using Support Vector Machine (SVM) models. Testing has revealed the unmatched requirements that are fixed to bring forward a valid application with sound data, effective models, and valid generalization code

    Evaluating Sensor Data in the Context of Mobile Crowdsensing

    Get PDF
    With the recent rise of the Internet of Things the prevalence of mobile sensors in our daily life experienced a huge surge. Mobile crowdsensing (MCS) is a new emerging paradigm that realizes the utility and ubiquity of smartphones and more precisely their incorporated smart sensors. By using the mobile phones and data of ordinary citizens, many problems have to be solved when designing an MCS-application. What data is needed in order to obtain the wanted results? Should the calculations be executed locally or on a server? How can the quality of data be improved? How can the data best be evaluated? These problems are addressed by the design of a streamlined approach of how to create an MCS-application while having all these problems in mind. In order to design this approach, an exhaustive literature research on existing MCS-applications was done and to validate this approach a new application was designed with its help. The procedure of designing and implementing this application went smoothly and thus shows the applicability of the approach

    Combining wearables and nearables for patient state analysis

    Get PDF
    Recently, ambient patient monitoring using wearable and nearable sensors is becoming more prevalent, especially in the neurodegenerative (Rett syndrome) and sleep disorder (Obstructive sleep apnea) populations. While wearables capture localized physiological data such as pulse rate, wrist acceleration and brain signals, nearables record global passive data including body movements, ambient sound and environmental variables. Together, wearables and nearables provide a more comprehensive understanding of the patient state. The processing of data captured from wearables and nearables have multiple challenges including handling missing data, time synchronization between sensors and developing data fusion techniques for multimodal analysis. The research described in this thesis addresses these issues while working on data captured in the wild. First, we describe a Rett syndrome severity estimator using a wearable biosensor and uncover physio-motor biomarkers. Second, we present the applications of an edge computing and ambient data capture system for home and clinical environments. Finally, we describe a transfer learning and multimodal data fusion based sleep-wake detector for a mixed-disorder elderly population. We show that combining data from wearables and nearables improves the performance of sleep-wake detection in terms of the F1-score and the Cohen’s kappa compared to the unimodal models.Ph.D

    Prediction Of Heart Failure Decompensations Using Artificial Intelligence - Machine Learning Techniques

    Get PDF
    Los apartados 4.41, 4.4.2 y 4.4.3 del capítulo 4 están sujetos a confidencialidad por la autora. 203 p.Heart failure (HF) is a major concern in public health. Its total impact is increased by its high incidence and prevalence and its unfavourable medium-term prognosis. In addition, HF leads to huge health care resource consumption. Moreover, efforts to develop a deterministic understanding of rehospitalization have been difficult, as no specific patient or hospital factors have been shown to consistently predict 30-day readmission after hospitalization for HF.Taking all these facts into account, we wanted to develop a project to improve the assistance care of patients with HF. Up to know, we were using telemonitoring with a codification system that generated alarms depending on the received values. However, these simple rules generated large number of false alerts being, hence, not trustworthy. The final aims of this work are: (i) asses the benefits of remote patient telemonitoring (RPT), (ii) improve the results obtained with RPT using ML techniques, detecting which parameters measured by telemonitoring best predict HF decompensations and creating predictive models that will reduce false alerts and detect early decompensations that otherwise will lead to hospital admissions and (iii) determine the influence of environmental factors on HF decompensations.All in all, the conclusions of this study are:1. Asses the benefits of RPT: Telemonitoring has not shown a statistically significant reduction in the number of HF-related hospital admissions. Nevertheless, we have observed a statistically significant reduction in mortality in the intervention group with a considerable percentage of deaths from non-cardiovascular causes. Moreover, patients have considered the RPT programme as a tool that can help them in the control of their chronic disease and in the relationship with health professionals.2. Improve the results obtained with RPT using machine learning techniques: Significant weight increases, desaturation below 90%, perception of clinical worsening, including development of oedema, worsening of functional class and orthopnoea are good predictors of heart failure decompensation. In addition, machine learning techniques have improved the current alerts system implemented in our hospital. The system reduces the number of false alerts notably although it entails a decrement on sensitivity values. The best results are achieved with the predictive model built by applying NB with Bernoulli to the combination of telemonitoring alerts and questionnaire alerts (Weight + Ankle + well-being plus the yellow alerts of systolic blood pressure, diastolic blood pressure, O2Sat and heart rate). 3. Determine the influence of environmental factors on HF decompensations: Air temperature is the most significant environmental factor (negative correlation) in our study, although some other attributes, such as precipitation, are also relevant. This work also shows a consistent association between increasing levels SO2 and NOX air and HF hospitalizations

    Prediction Of Heart Failure Decompensations Using Artificial Intelligence - Machine Learning Techniques

    Get PDF
    Los apartados 4.41, 4.4.2 y 4.4.3 del capítulo 4 están sujetos a confidencialidad por la autora. 203 p.Heart failure (HF) is a major concern in public health. Its total impact is increased by its high incidence and prevalence and its unfavourable medium-term prognosis. In addition, HF leads to huge health care resource consumption. Moreover, efforts to develop a deterministic understanding of rehospitalization have been difficult, as no specific patient or hospital factors have been shown to consistently predict 30-day readmission after hospitalization for HF.Taking all these facts into account, we wanted to develop a project to improve the assistance care of patients with HF. Up to know, we were using telemonitoring with a codification system that generated alarms depending on the received values. However, these simple rules generated large number of false alerts being, hence, not trustworthy. The final aims of this work are: (i) asses the benefits of remote patient telemonitoring (RPT), (ii) improve the results obtained with RPT using ML techniques, detecting which parameters measured by telemonitoring best predict HF decompensations and creating predictive models that will reduce false alerts and detect early decompensations that otherwise will lead to hospital admissions and (iii) determine the influence of environmental factors on HF decompensations.All in all, the conclusions of this study are:1. Asses the benefits of RPT: Telemonitoring has not shown a statistically significant reduction in the number of HF-related hospital admissions. Nevertheless, we have observed a statistically significant reduction in mortality in the intervention group with a considerable percentage of deaths from non-cardiovascular causes. Moreover, patients have considered the RPT programme as a tool that can help them in the control of their chronic disease and in the relationship with health professionals.2. Improve the results obtained with RPT using machine learning techniques: Significant weight increases, desaturation below 90%, perception of clinical worsening, including development of oedema, worsening of functional class and orthopnoea are good predictors of heart failure decompensation. In addition, machine learning techniques have improved the current alerts system implemented in our hospital. The system reduces the number of false alerts notably although it entails a decrement on sensitivity values. The best results are achieved with the predictive model built by applying NB with Bernoulli to the combination of telemonitoring alerts and questionnaire alerts (Weight + Ankle + well-being plus the yellow alerts of systolic blood pressure, diastolic blood pressure, O2Sat and heart rate). 3. Determine the influence of environmental factors on HF decompensations: Air temperature is the most significant environmental factor (negative correlation) in our study, although some other attributes, such as precipitation, are also relevant. This work also shows a consistent association between increasing levels SO2 and NOX air and HF hospitalizations

    Towards Aggregating Time-Discounted Information in Sensor Networks

    Get PDF
    Sensor networks are deployed to monitor a seemingly endless list of events in a multitude of application domains. Through data collection and aggregation enhanced with data mining and machine learning techniques, many static and dynamic patterns can be found by sensor networks. The aggregation problem is complicated by the fact that the perceived value of the data collected by the sensors is affected by many factors such as time, location and user valuation. In addition, the value of information deteriorates often dramatically over time. Through our research, we already achieved some results: A formal algebraic analysis of information discounting, especially affected by time. A general model and two specific models are developed for information discounting. The two specific models formalize exponetial time-discount and linear time-discount. An algebraic analysis of aggregation of values that decay with time exponentially. Three types of aggregators that offset discounting effects are formalized and analyzed. A natural synthesis of these three aggregators is discovered and modeled. We apply our theoretical models to emergency response with thresholding and confirm with extensive simulation. For long-term monitoring tasks, we laid out a theoretical foundation for discovering an emergency through generations of sensors, analysed the achievability of a long-term task and found an optimum way to distribute sensors in a monitored area to maximize the achievability. We proposed an implementation for our alert system with state-of-art wireless microcontrollers, sensors, real-time operating systems and embedded internet protocols. By allowing aggregation of time-discounted information to proceed in an arbitrary, not necessarily pairwise manner, our results are also applicable to other similar homeland security and military application domains where there is a strong need to model not only timely aggregation of data collected by individual sensors, but also the dynamics of this aggregation. Our research can be applied to many real-world scenarios. A typical scenario is monitoring wildfire in the forest: A batch of first-generation sensors are deployed by UAVs to monitor a forest for possible wildfire. They monitor various weather quantities and recognize the area with the highest possibility of producing a fire --- the so-called area of interest (AoI). Since the environment changes dynamically, so after a certain time, the sensors re-identify the AoI. The value of the knowledge they learned about the previous AoI decays with time quickly, our methods of aggregation of time-discounted information can be applied to get update knowledge. Close to depletion of their energy of the current generation of sensors, a new generation of sensors are deployed and inherit the knowledge from the current generation. Through this way, monitoring long-term tasks becomes feasible. At the end of this thesis, we propose some extensions and directions from our current research: Generalize and extend the special classes of Type 1 and Type 2 aggregation operators; Analyze aggregation operator of Type 3 and Type 4, find some special applicable candidates; Data aggregation across consecutive generations of sensors in order to learn about events with discounting that take a long time to manifest themselves; Network implications of various aggregation strategies; Algorithms for implementation of some special classes of aggregators. Implement wireless sensor network that can autonomously learn and recognize patterns of emergencies, predict incidents and trigger alarms through machine learning
    • …
    corecore