3,521 research outputs found
Sequence-to-Sequence Imputation of Missing Sensor Data
Although the sequence-to-sequence (encoder-decoder) model is considered the
state-of-the-art in deep learning sequence models, there is little research
into using this model for recovering missing sensor data. The key challenge is
that the missing sensor data problem typically comprises three sequences (a
sequence of observed samples, followed by a sequence of missing samples,
followed by another sequence of observed samples) whereas, the
sequence-to-sequence model only considers two sequences (an input sequence and
an output sequence). We address this problem by formulating a
sequence-to-sequence in a novel way. A forward RNN encodes the data observed
before the missing sequence and a backward RNN encodes the data observed after
the missing sequence. A decoder decodes the two encoders in a novel way to
predict the missing data. We demonstrate that this model produces the lowest
errors in 12% more cases than the current state-of-the-art
Recommended from our members
Ensemble learning of model hyperparameters and spatiotemporal data for calibration of low-cost PM2.5 sensors.
he PM2.5 air quality index (AQI) measurements from government-built supersites are accurate but cannot provide a dense coverage of monitoring areas. Low-cost PM2.5 sensors can be used to deploy a fine-grained internet-of-things (IoT) as a complement to government facilities. Calibration of low-cost sensors by reference to high-accuracy supersites is thus essential. Moreover, the imputation for missing-value in training data may affect the calibration result, the best performance of calibration model requires hyperparameter optimization, and the affecting factors of PM2.5 concentrations such as climate, geographical landscapes and anthropogenic activities are uncertain in spatial and temporal dimensions. In this paper, an ensemble learning for imputation method selection, calibration model hyperparameterization, and spatiotemporal training data composition is proposed. Three government supersites are chosen in central Taiwan for the deployment of low-cost sensors and hourly PM2.5 measurements are collected for 60 days for conducting experiments. Three optimizers, Sobol sequence, Nelder and Meads, and particle swarm optimization (PSO), are compared for evaluating their performances with various versions of ensembles. The best calibration results are obtained by using PSO, and the improvement ratios with respect to R2, RMSE, and NME, are 4.92%, 52.96%, and 56.85%, respectively
DropIn: Making Reservoir Computing Neural Networks Robust to Missing Inputs by Dropout
The paper presents a novel, principled approach to train recurrent neural
networks from the Reservoir Computing family that are robust to missing part of
the input features at prediction time. By building on the ensembling properties
of Dropout regularization, we propose a methodology, named DropIn, which
efficiently trains a neural model as a committee machine of subnetworks, each
capable of predicting with a subset of the original input features. We discuss
the application of the DropIn methodology in the context of Reservoir Computing
models and targeting applications characterized by input sources that are
unreliable or prone to be disconnected, such as in pervasive wireless sensor
networks and ambient intelligence. We provide an experimental assessment using
real-world data from such application domains, showing how the Dropin
methodology allows to maintain predictive performances comparable to those of a
model without missing features, even when 20\%-50\% of the inputs are not
available
Novel methods for imputing missing values in water level monitoring data
Hydrological data are collected automatically from remote water level monitoring stations and then transmitted to the national water management centre via telemetry system. How- ever, the data received at the centre can be incomplete or anomalous due to some issues with the instruments such as power and sensor failures. Usually, the detected anomalies or missing data are just simply eliminated from the data, which could lead to inaccurate analysis or even false alarms. Therefore, it is very helpful to identify missing values and correct them as accurate as possible. In this paper, we introduced a new approach - Full Subsequence Matching (FSM), for imputing missing values in telemetry water level data. The FSM firstly identifies a sequence of missing values and replaces them with some constant values to create a dummy complete sequence. Then, searching for the most similar subsequence from the historical data. Finally, the identified subsequence will be adapted to fit the missing part based on their similarity. The imputation accuracy of the FSM was evaluated with telemetry water level data and compared to some well-established methods - Interpolation, k-NN, MissForest, and also a leading deep learning method - the Long Short-Term Memory (LSTM) technique. Experimental results show that the FSM technique can produce more precise imputations, particularly for those with strong periodic patterns
- …