    Hydrologic prediction using pattern recognition and soft-computing techniques

    Several studies indicate that the data-driven models have proven to be potentially useful tools in hydrological modeling. Nevertheless, it is a common perception among researchers and practitioners that the usefulness of the system theoretic models is limited to forecast applications, and they cannot be used as a tool for scientific investigations. Also, the system-theoretic models are believed to be less reliable as they characterize the hydrological processes by learning the input-output patterns embedded in the dataset and not based on strong physical understanding of the system. It is imperative that the above concerns needs to be addressed before the data-driven models can gain wider acceptability by researchers and practitioners.In this research different methods and tools that can be adopted to promote transparency in the data-driven models are probed with the objective of extending the usefulness of data-driven models beyond forecast applications as a tools for scientific investigations, by providing additional insights into the underlying input-output patterns based on which the data-driven models arrive at a decision. In this regard, the utility of self-organizing networks (competitive learning and self-organizing maps) in learning the patterns in the input space is evaluated by developing a novel neural network model called the spiking modular neural networks (SMNNs). The performance of the SMNNs is evaluated based on its ability to characterize streamflows and actual evapotranspiration process. Also the utility of self-organizing algorithms, namely genetic programming (GP), is evaluated with regards to its ability to promote transparency in data-driven models. The robustness of the GP to evolve its own model structure with relevant parameters is illustrated by applying GP to characterize the actual-evapotranspiration process. The results from this research indicate that self-organization in learning, both in terms of self-organizing networks and self-organizing algorithms, could be adopted to promote transparency in data-driven models.In pursuit of improving the reliability of the data-driven models, different methods for incorporating uncertainty estimates as part of the data-driven model building exercise is evaluated in this research. The local-scale models are shown to be more reliable than the global-scale models in characterizing the saturated hydraulic conductivity of soils. In addition, in this research, the importance of model structure uncertainty in geophysical modeling is emphasized by developing a framework to account for the model structure uncertainty in geophysical modeling. The contribution of the model structure uncertainty to the predictive uncertainty of the model is shown to be larger than the uncertainty associated with the model parameters. Also it has been demonstrated that increasing the model complexity may lead to a better fit of the function, but at the cost of an increasing level of uncertainty. It is recommended that the effect of model structure uncertainty should be considered for developing reliable hydrological models

    Artificial Intelligence and Machine Learning Approaches to Energy Demand-Side Response: A Systematic Review

    Recent years have seen an increasing interest in Demand Response (DR) as a means to provide flexibility, and hence improve the reliability of energy systems in a cost-effective way. Yet, the high complexity of the tasks associated with DR, combined with their use of large-scale data and the frequent need for near real-time de-cisions, means that Artificial Intelligence (AI) and Machine Learning (ML) — a branch of AI — have recently emerged as key technologies for enabling demand-side response. AI methods can be used to tackle various challenges, ranging from selecting the optimal set of consumers to respond, learning their attributes and pref-erences, dynamic pricing, scheduling and control of devices, learning how to incentivise participants in the DR schemes and how to reward them in a fair and economically efficient way. This work provides an overview of AI methods utilised for DR applications, based on a systematic review of over 160 papers, 40 companies and commercial initiatives, and 21 large-scale projects. The papers are classified with regards to both the AI/ML algorithm(s) used and the application area in energy DR. Next, commercial initiatives are presented (including both start-ups and established companies) and large-scale innovation projects, where AI methods have been used for energy DR. The paper concludes with a discussion of advantages and potential limitations of reviewed AI techniques for different DR tasks, and outlines directions for future research in this fast-growing area

    Agrupamiento, predicción y clasificación ordinal para series temporales utilizando técnicas de machine learning: aplicaciones

    In the last years, there has been an increase in the number of fields improving their standard processes by using machine learning (ML) techniques. The main reason for this is that the vast amount of data generated by these processes is difficult to be processed by humans. Therefore, the development of automatic methods to process and extract relevant information from these data processes is of great necessity, giving that these approaches could lead to an increase in the economic benefit of enterprises or to a reduction in the workload of some current employments. Concretely, in this Thesis, ML approaches are applied to problems concerning time series data. Time series is a special kind of data in which data points are collected chronologically. Time series are present in a wide variety of fields, such as atmospheric events or engineering applications. Besides, according to the main objective to be satisfied, there are different tasks in the literature applied to time series. Some of them are those on which this Thesis is mainly focused: clustering, classification, prediction and, in general, analysis. Generally, the amount of data to be processed is huge, arising the need of methods able to reduce the dimensionality of time series without decreasing the amount of information. In this sense, the application of time series segmentation procedures dividing the time series into different subsequences is a good option, given that each segment defines a specific behaviour. Once the different segments are obtained, the use of statistical features to characterise them is an excellent way to maximise the information of the time series and simultaneously reducing considerably their dimensionality. In the case of time series clustering, the objective is to find groups of similar time series with the idea of discovering interesting patterns in time series datasets. In this Thesis, we have developed a novel time series clustering technique. The aim of this proposal is twofold: to reduce as much as possible the dimensionality and to develop a time series clustering approach able to outperform current state-of-the-art techniques. In this sense, for the first objective, the time series are segmented in order to divide the them identifying different behaviours. Then, these segments are projected into a vector of statistical features aiming to reduce the dimensionality of the time series. Once this preprocessing step is done, the clustering of the time series is carried out, with a significantly lower computational load. This novel approach has been tested on all the time series datasets available in the University of East Anglia and University of California Riverside (UEA/UCR) time series classification (TSC) repository. Regarding time series classification, two main paths could be differentiated: firstly, nominal TSC, which is a well-known field involving a wide variety of proposals and transformations applied to time series. Concretely, one of the most popular transformation is the shapelet transform (ST), which has been widely used in this field. The original method extracts shapelets from the original time series and uses them for classification purposes. Nevertheless, the full enumeration of all possible shapelets is very time consuming. Therefore, in this Thesis, we have developed a hybrid method that starts with the best shapelets extracted by using the original approach with a time constraint and then tunes these shapelets by using a convolutional neural network (CNN) model. Secondly, time series ordinal classification (TSOC) is an unexplored field beginning with this Thesis. In this way, we have adapted the original ST to the ordinal classification (OC) paradigm by proposing several shapelet quality measures taking advantage of the ordinal information of the time series. This methodology leads to better results than the state-of-the-art TSC techniques for those ordinal time series datasets. All these proposals have been tested on all the time series datasets available in the UEA/UCR TSC repository. With respect to time series prediction, it is based on estimating the next value or values of the time series by considering the previous ones. In this Thesis, several different approaches have been considered depending on the problem to be solved. Firstly, the prediction of low-visibility events produced by fog conditions is carried out by means of hybrid autoregressive models (ARs) combining fixed-size and dynamic windows, adapting itself to the dynamics of the time series. Secondly, the prediction of convective cloud formation (which is a highly imbalance problem given that the number of convective cloud events is much lower than that of non-convective situations) is performed in two completely different ways: 1) tackling the problem as a multi-objective classification task by the use of multi-objective evolutionary artificial neural networks (MOEANNs), in which the two conflictive objectives are accuracy of the minority class and the global accuracy, and 2) tackling the problem from the OC point of view, in which, in order to reduce the imbalance degree, an oversampling approach is proposed along with the use of OC techniques. Thirdly, the prediction of solar radiation is carried out by means of evolutionary artificial neural networks (EANNs) with different combinations of basis functions in the hidden and output layers. Finally, the last challenging problem is the prediction of energy flux from waves and tides. For this, a multitask EANN has been proposed aiming to predict the energy flux at several prediction time horizons (from 6h to 48h). All these proposals and techniques have been corroborated and discussed according to physical and atmospheric models. The work developed in this Thesis is supported by 11 JCR-indexed papers in international journals (7 Q1, 3 Q2, 1 Q3), 11 papers in international conferences, and 4 papers in national conferences

    Modeling and analysis of actual evapotranspiration using data driven and wavelet techniques

    Large-scale mining practices have disturbed many natural watersheds in northern Alberta, Canada. To restore disturbed landscapes and ecosystems’ functions, reconstruction strategies have been adopted with the aim of establishing sustainable reclaimed lands. The success of the reconstruction process depends on the design of reconstruction strategies, which can be optimized by improving the understanding of the controlling hydrological processes in the reconstructed watersheds. Evapotranspiration is one of the important components of the hydrological cycle; its estimation and analysis are crucial for better assessment of the reconstructed landscape hydrology, and for more efficient design. The complexity of the evapotranspiration process and its variability in time and space has imposed some limitations on previously developed evapotranspiration estimation models. The vast majority of the available models estimate the rate of potential evapotranspiration, which occurs under unlimited water supply condition. However, the rate of actual evapotranspiration (AET) depends on the available soil moisture, which makes its physical modeling more complicated than the potential evapotranspiration. The main objective of this study is to estimate and analyze the AET process in a reconstructed landscape. Data driven techniques can model the process without having a complete understanding of its physics. In this study, three data driven models; genetic programming (GP), artificial neural networks (ANNs), and multilinear regression (MLR), were developed and compared for estimating the hourly eddy covariance (EC)-measured AET using meteorological variables. The AET was modeled as a function of five meteorological variables: net radiation (Rn), ground temperature (Tg), air temperature (Ta), relative humidity (RH), and wind speed (Ws) in a reconstructed landscape located in northern Alberta, Canada. Several ANN models were evaluated using two training algorithms of Levenberg-Marquardt and Bayesian regularization. The GP technique was employed to generate mathematical equations correlating AET to the five meteorological variables. Furthermore, the available data were statistically analyzed to obtain MLR models and to identify the meteorological variables that have significant effect on the evapotranspiration process. The utility of the investigated data driven models was also compared with that of HYDRUS-1D model, which is a physically based model that makes use of conventional Penman-Monteith (PM) method for the prediction of AET. HYDRUS-1D model was examined for estimating AET using meteorological variables, leaf area index, and soil moisture information. Furthermore, Wavelet analysis (WA), as a multiresolution signal processing tool, was examined to improve the understanding of the available time series temporal variations, through identifying the significant cyclic features, and to explore the possible correlation between AET and the meteorological signals. WA was used with the purpose of input determination of AET models, a priori. The results of this study indicated that all three proposed data driven models were able to approximate the AET reasonably well; however, GP and MLR models had better generalization ability than the ANN model. GP models demonstrated that the complex process of hourly AET can be efficiently modeled as simple semi-linear functions of few meteorological variables. The results of HYDRUS-1D model exhibited that a physically based model, such as HYDRUS-1D, might perform on par or even inferior to the data driven models in terms of the overall prediction accuracy. The developed equation-based models; GP and MLR, revealed the larger contribution of net radiation and ground temperature, compared to other variables, to the estimation of AET. It was also found that the interaction effects of meteorological variables are important for the AET modeling. The results of wavelet analysis demonstrated the presence of both small-scale (2 to 8 hours) and larger-scale (e.g. diurnal) cyclic features in most of the investigated time series. Larger-scale cyclic features were found to be the dominant source of temporal variations in the AET and most of the meteorological variables. The results of cross wavelet analysis indicated that the cause and effect relationship between AET and the meteorological variables might vary based on the time-scale of variation under consideration. At small time-scales, significant linear correlations were observed between AET and Rn, RH, and Ws time series, while at larger time-scales significant linear correlations were observed between AET and Rn, RH, Tg, and Ta time series

    Data-Driven Methods for Demand-Side Flexibility in Energy Systems

    Enhancing statistical wind speed forecasting models : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering at Massey University, Manawatū Campus, New Zealand

    In recent years, wind speed forecasting models have seen significant development and growth. In particular, hybrid models have been emerging since the last decade. Hybrid models combine two or more techniques from several categories, with each model utilizing its distinct strengths. Mainly, data-driven models that include statistical and Artificial Intelligence/Machine Learning (AI/ML) models are deployed in hybrid models for shorter forecasting time horizons (< 6hrs). Literature studies show that machine learning models have gained enormous potential owing to their accuracy and robustness. On the other hand, only a handful of studies are available on the performance enhancement of statistical models, despite the fact that hybrid models are incomplete without statistical models. To address the knowledge gap, this thesis identified the shortcomings of traditional statistical models while enhancing prediction accuracy. Three statistical models are considered for analyses: Grey Model [GM(1,1)], Markov Chain, and Holt’s Double Exponential Smoothing models. Initially, the problems that limit the forecasting models' applicability are highlighted. Such issues include negative wind speed predictions, failure of predetermined accuracy levels, non-optimal estimates, and additional computational cost with limited performance. To address these concerns, improved forecasting models are proposed considering wind speed data of Palmerston North, New Zealand. Several methodologies have been developed to improve the model performance and fulfill the necessary and sufficient conditions. These approaches include adjusting dynamic moving window, self-adaptive state categorization algorithm, a similar approach to the leave-one-out method, and mixed initialization method. Keeping in view the application of the hybrid methods, novel MODWT-ARIMA-Markov and AGO-HDES models are further proposed as secondary objectives. Also, a comprehensive analysis is presented by comparing sixteen models from three categories, each for four case studies, three rolling windows, and three forecasting horizons. Overall, the improved models showed higher accuracy than their counter traditional models. Finally, the future directions are highlighted that need subsequent research to improve forecasting performance further
