101 research outputs found
Graph-based Virtual Sensing from Sparse and Partial Multivariate Observations
Virtual sensing techniques allow for inferring signals at new unmonitored
locations by exploiting spatio-temporal measurements coming from physical
sensors at different locations. However, as the sensor coverage becomes sparse
due to costs or other constraints, physical proximity cannot be used to support
interpolation. In this paper, we overcome this challenge by leveraging
dependencies between the target variable and a set of correlated variables
(covariates) that can frequently be associated with each location of interest.
From this viewpoint, covariates provide partial observability, and the problem
consists of inferring values for unobserved channels by exploiting observations
at other locations to learn how such variables can correlate. We introduce a
novel graph-based methodology to exploit such relationships and design a graph
deep learning architecture, named GgNet, implementing the framework. The
proposed approach relies on propagating information over a nested graph
structure that is used to learn dependencies between variables as well as
locations. GgNet is extensively evaluated under different virtual sensing
scenarios, demonstrating higher reconstruction accuracy compared to the
state-of-the-art.Comment: Accepted at ICLR 202
Analysed potential of big data and supervised machine learning techniques in effectively forecasting travel times from fused data
Travel time forecasting is an interesting topic for many ITS services. Increased availability of data collection sensors increases the availability of the predictor variables but also highlights the high processing issues related to this big data availability. In this paper we aimed to analyse the potential of big data and supervised machine learning techniques in effectively forecasting travel times. For this purpose we used fused data from three data sources (Global Positioning System vehicles tracks, road network infrastructure data and meteorological data) and four machine learning techniques (k-nearest neighbours, support vector machines, boosting trees and random forest). To evaluate the forecasting results we compared them in-between different road classes in the context of absolute values, measured in minutes, and the mean squared percentage error. For the road classes with the high average speed and long road segments, machine learning techniques forecasted travel times with small relative error, while for the road classes with the small average speeds and segment lengths this was a more demanding task. All three data sources were proven itself to have a high impact on the travel time forecast accuracy and the best results (taking into account all road classes) were achieved for the k-nearest neighbours and random forest techniques.</p
Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities
Traffic prediction plays a crucial role in alleviating traffic congestion
which represents a critical problem globally, resulting in negative
consequences such as lost hours of additional travel time and increased fuel
consumption. Integrating emerging technologies into transportation systems
provides opportunities for improving traffic prediction significantly and
brings about new research problems. In order to lay the foundation for
understanding the open research challenges in traffic prediction, this survey
aims to provide a comprehensive overview of traffic prediction methodologies.
Specifically, we focus on the recent advances and emerging research
opportunities in Artificial Intelligence (AI)-based traffic prediction methods,
due to their recent success and potential in traffic prediction, with an
emphasis on multivariate traffic time series modeling. We first provide a list
and explanation of the various data types and resources used in the literature.
Next, the essential data preprocessing methods within the traffic prediction
context are categorized, and the prediction methods and applications are
subsequently summarized. Lastly, we present primary research challenges in
traffic prediction and discuss some directions for future research.Comment: Published in Transportation Research Part C: Emerging Technologies
(TR_C), Volume 145, 202
Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China
With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models).</p
Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities
With the increasing amount of spatial-temporal~(ST) ocean data, numerous
spatial-temporal data mining (STDM) studies have been conducted to address
various oceanic issues, e.g., climate forecasting and disaster warning.
Compared with typical ST data (e.g., traffic data), ST ocean data is more
complicated with some unique characteristics, e.g., diverse regionality and
high sparsity. These characteristics make it difficult to design and train STDM
models. Unfortunately, an overview of these studies is still missing, hindering
computer scientists to identify the research issues in ocean while discouraging
researchers in ocean science from applying advanced STDM techniques. To remedy
this situation, we provide a comprehensive survey to summarize existing STDM
studies in ocean. Concretely, we first summarize the widely-used ST ocean
datasets and identify their unique characteristics. Then, typical ST ocean data
quality enhancement techniques are discussed. Next, we classify existing STDM
studies for ocean into four types of tasks, i.e., prediction, event detection,
pattern mining, and anomaly detection, and elaborate the techniques for these
tasks. Finally, promising research opportunities are highlighted. This survey
will help scientists from the fields of both computer science and ocean science
have a better understanding of the fundamental concepts, key techniques, and
open challenges of STDM in ocean
Geo-physical parameter forecasting on imagery{based data sets using machine learning techniques
>Magister Scientiae - MScThis research objectively investigates the e ectiveness of machine learning (ML) tools
towards predicting several geo-physical parameters. This is based on a large number
of studies that have reported high levels of prediction success using ML in the eld.
Therefore, several widely used ML tools coupled with a number of di erent feature sets
are used to predict six geophysical parameters namely rainfall, groundwater, evapora-
tion, humidity, temperature, and wind. The results of the research indicate that: a)
a large number of related studies in the eld are prone to speci c pitfalls that lead to
over-estimated results in favour of ML tools; b) the use of gaussian mixture models as
global features can provide a higher accuracy compared to other local feature sets; c)
ML never outperform simple statistically-based estimators on highly-seasonal parame-
ters, and providing error bars is key to objectively evaluating the relative performance
of the ML tools used; and d) ML tools can be e ective for parameters that are slow-
changing such as groundwater
Statistical modelling with additive Gaussian process priors
Regression with Gaussian process (GP) priors has become increasingly popular due to its ability to model complex relationships between variables and handle auto-correlation in the data through the covariance function of the process, called kernel. Despite its popularity, the statistical modelling aspect of GP regression has received relatively limited attention. In this thesis, we explore a regression model where the regression function can be decomposed into a sum of lower-dimensional functions, akin to the principles of Generalised Additive Models (Hastie and Tibshirani, 1990). We propose additive interaction modelling using a class of hierarchical ANOVA decomposition kernel. This flexible statistical modelling framework naturally accommodates interaction effects of any order without increasing the number of model parameters. Our approach facilitates straightforward assessment and comparison of models with different interaction structures through the model marginal likelihood. We also demonstrate how this framework enhances the interpretability of complex data structures, especially when combined with the concept of kernel centring. The second segment of the thesis focuses on the computational aspects of implementing the proposed additive models for handling large-scale data structured in multidimensional grids. Such structured data often arise in scenarios involving multilevel repeated measurements, as commonly seen in spatio-temporal analysis or medical, behavioural, and psychological studies. Leveraging the Kronecker product structure within the covariance matrix, we reduce the time complexity to O(n3) and storage requirements to O(n2). We extend existing work in the GP literature to encompass all models under hierarchical ANOVA decomposition kernels. Additionally, we address issues related to incomplete grids and various missingness mechanisms. We illustrate the practical application of our proposed methodologies using both simulated and real-world spatio-temporal and longitudinal data
Data Science in Healthcare
Data science is an interdisciplinary field that applies numerous techniques, such as machine learning, neural networks, and deep learning, to create value based on extracting knowledge and insights from available data. Advances in data science have a significant impact on healthcare. While advances in the sharing of medical information result in better and earlier diagnoses as well as more patient-tailored treatments, information management is also affected by trends such as increased patient centricity (with shared decision making), self-care (e.g., using wearables), and integrated care delivery. The delivery of health services is being revolutionized through the sharing and integration of health data across organizational boundaries. Via data science, researchers can deliver new approaches to merge, analyze, and process complex data and gain more actionable insights, understanding, and knowledge at the individual and population levels. This Special Issue focuses on how data science is used in healthcare (e.g., through predictive modeling) and on related topics, such as data sharing and data management
A review of machine learning applications in wildfire science and management
Artificial intelligence has been applied in wildfire science and management
since the 1990s, with early applications including neural networks and expert
systems. Since then the field has rapidly progressed congruently with the wide
adoption of machine learning (ML) in the environmental sciences. Here, we
present a scoping review of ML in wildfire science and management. Our
objective is to improve awareness of ML among wildfire scientists and managers,
as well as illustrate the challenging range of problems in wildfire science
available to data scientists. We first present an overview of popular ML
approaches used in wildfire science to date, and then review their use in
wildfire science within six problem domains: 1) fuels characterization, fire
detection, and mapping; 2) fire weather and climate change; 3) fire occurrence,
susceptibility, and risk; 4) fire behavior prediction; 5) fire effects; and 6)
fire management. We also discuss the advantages and limitations of various ML
approaches and identify opportunities for future advances in wildfire science
and management within a data science context. We identified 298 relevant
publications, where the most frequently used ML methods included random
forests, MaxEnt, artificial neural networks, decision trees, support vector
machines, and genetic algorithms. There exists opportunities to apply more
current ML methods (e.g., deep learning and agent based learning) in wildfire
science. However, despite the ability of ML models to learn on their own,
expertise in wildfire science is necessary to ensure realistic modelling of
fire processes across multiple scales, while the complexity of some ML methods
requires sophisticated knowledge for their application. Finally, we stress that
the wildfire research and management community plays an active role in
providing relevant, high quality data for use by practitioners of ML methods.Comment: 83 pages, 4 figures, 3 table
- …