2,452 research outputs found

    Extending time series forecasting methods using functional principal components analysis

    Get PDF
    Traffic volume forecasts are used by many transportation analysis and management systems to better characterize and react to fluctuating traffic patterns. Most current forecasting methods do not take advantage of the underlying functional characteristics of the time series to make predictions. This paper presents a methodology that uses Functional Principal Components Analysis (FPCA) to create smooth and differentiable daily traffic forecasts. The methodology is validated with a data set of 1,813 days of 15 minute aggregated traffic volume time series. Both the FPCA based forecasts and the associated prediction intervals outperform traditional Seasonal Autoregressive Integrated Moving Average (SARIMA) based methods --Abstract, page iii

    A Comprehensive Survey on Generative Diffusion Models for Structured Data

    Full text link
    In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, there is still a lack of literature and its reviews on structured data modelling via diffusion models, compared to other data modalities such as visual and textual data. To address this gap, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works that used structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions. We hope this review serves as a catalyst for the research community, promoting developments in generative diffusion models for structured data.Comment: 20 pages, 1 figure, 2 table

    Imputation of Rainfall Data Using the Sine Cosine Function Fitting Neural Network

    Get PDF
    Missing rainfall data have reduced the quality of hydrological data analysis because they are the essential input for hydrological modeling. Much research has focused on rainfall data imputation. However, the compatibility of precipitation (rainfall) and non-precipitation (meteorology) as input data has received less attention. First, we propose a novel pre-processing mechanism for non-precipitation data by using principal component analysis (PCA). Before the imputation, PCA is used to extract the most relevant features from the meteorological data. The final output of the PCA is combined with the rainfall data from the nearest neighbor gauging stations and then used as the input to the neural network for missing data imputation. Second, a sine cosine algorithm is presented to optimize neural network for infilling the missing rainfall data. The proposed sine cosine function fitting neural network (SC-FITNET) was compared with the sine cosine feedforward neural network (SCFFNN), feedforward neural network (FFNN) and long short-term memory (LSTM) approaches. The results showed that the proposed SC-FITNET outperformed LSTM, SC-FFNN and FFNN imputation in terms of mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (R), with an average accuracy of 90.9%. This study revealed that as the percentage of missingness increased, the precision of the four imputation methods reduced. In addition, this study also revealed that PCA has potential in pre-processing meteorological data into an understandable format for the missing data imputation

    Deep learning for the early detection of harmful algal blooms and improving water quality monitoring

    Get PDF
    Climate change will affect how water sources are managed and monitored. The frequency of algal blooms will increase with climate change as it presents favourable conditions for the reproduction of phytoplankton. During monitoring, possible sensory failures in monitoring systems result in partially filled data which may affect critical systems. Therefore, imputation becomes necessary to decrease error and increase data quality. This work investigates two issues in water quality data analysis: improving data quality and anomaly detection. It consists of three main topics: data imputation, early algal bloom detection using in-situ data and early algal bloom detection using multiple modalities.The data imputation problem is addressed by experimenting with various methods with a water quality dataset that includes four locations around the North Sea and the Irish Sea with different characteristics and high miss rates, testing model generalisability. A novel neural network architecture with self-attention is proposed in which imputation is done in a single pass, reducing execution time. The self-attention components increase the interpretability of the imputation process at each stage of the network, providing knowledge to domain experts.After data curation, algal activity is predicted using transformer networks, between 1 to 7 days ahead, and the importance of the input with regard to the output of the prediction model is explained using SHAP, aiming to explain model behaviour to domain experts which is overlooked in previous approaches. The prediction model improves bloom detection performance by 5% on average and the explanation summarizes the complex structure of the model to input-output relationships. Performance improvements on the initial unimodal bloom detection model are made by incorporating multiple modalities into the detection process which were only used for validation purposes previously. The problem of missing data is also tackled by using coordinated representations, replacing low quality in-situ data with satellite data and vice versa, instead of imputation which may result in biased results

    Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition

    Full text link
    Dealing with time series with missing values, including those afflicted by low quality or over-saturation, presents a significant signal processing challenge. The task of recovering these missing values, known as imputation, has led to the development of several algorithms. However, we have observed that the efficacy of these algorithms tends to diminish when the time series exhibit non-stationary oscillatory behavior. In this paper, we introduce a novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the performance of existing imputation algorithms for oscillatory time series. After running any chosen imputation algorithm, HaLI leverages the harmonic decomposition based on the adaptive nonharmonic model of the initial imputation to improve the imputation accuracy for oscillatory time series. Experimental assessments conducted on synthetic and real signals consistently highlight that HaLI enhances the performance of existing imputation algorithms. The algorithm is made publicly available as a readily employable Matlab code for other researchers to use

    Novel Computationally Intelligent Machine Learning Algorithms for Data Mining and Knowledge Discovery

    Get PDF
    This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model

    Data-Driven Copy-Paste Imputation for Energy Time Series

    Get PDF
    A cornerstone of the worldwide transition to smart grids are smart meters. Smart meters typically collect and provide energy time series that are vital for various applications, such as grid simulations, fault-detection, load forecasting, load analysis, and load management. Unfortunately, these time series are often characterized by missing values that must be handled before the data can be used. A common approach to handle missing values in time series is imputation. However, existing imputation methods are designed for power time series and do not take into account the total energy of gaps, resulting in jumps or constant shifts when imputing energy time series. In order to overcome these issues, the present paper introduces the new Copy-Paste Imputation (CPI) method for energy time series. The CPI method copies data blocks with similar properties and pastes them into gaps of the time series while preserving the total energy of each gap. The new method is evaluated on a real-world dataset that contains six shares of artificially inserted missing values between 1 and 30%. It outperforms by far the three benchmark imputation methods selected for comparison. The comparison furthermore shows that the CPI method uses matching patterns and preserves the total energy of each gap while requiring only a moderate run-time.Comment: 8 pages, 7 figures, submitted to IEEE Transactions on Smart Grid, the first two authors equally contributed to this wor
    corecore