387 research outputs found
Fuzzy jump wavelet neural network based on rule induction for dynamic nonlinear system identification with real data applications
Aim
Fuzzy wavelet neural network (FWNN) has proven to be a promising strategy in the identification of nonlinear systems. The network considers both global and local properties, deals with imprecision present in sensory data, leading to desired precisions. In this paper, we proposed a new FWNN model nominated “Fuzzy Jump Wavelet Neural Network” (FJWNN) for identifying dynamic nonlinear-linear systems, especially in practical applications.
Methods
The proposed FJWNN is a fuzzy neural network model of the Takagi-Sugeno-Kang type whose consequent part of fuzzy rules is a linear combination of input regressors and dominant wavelet neurons as a sub-jump wavelet neural network. Each fuzzy rule can locally model both linear and nonlinear properties of a system. The linear relationship between the inputs and the output is learned by neurons with linear activation functions, whereas the nonlinear relationship is locally modeled by wavelet neurons. Orthogonal least square (OLS) method and genetic algorithm (GA) are respectively used to purify the wavelets for each sub-JWNN. In this paper, fuzzy rule induction improves the structure of the proposed model leading to less fuzzy rules, inputs of each fuzzy rule and model parameters. The real-world gas furnace and the real electromyographic (EMG) signal modeling problem are employed in our study. In the same vein, piecewise single variable function approximation, nonlinear dynamic system modeling, and Mackey–Glass time series prediction, ratify this method superiority. The proposed FJWNN model is compared with the state-of-the-art models based on some performance indices such as RMSE, RRSE, Rel ERR%, and VAF%.
Results
The proposed FJWNN model yielded the following results: RRSE (mean±std) of 10e-5±6e-5 for piecewise single-variable function approximation, RMSE (mean±std) of 2.6–4±2.6e-4 for the first nonlinear dynamic system modelling, RRSE (mean±std) of 1.59e-3±0.42e-3 for Mackey–Glass time series prediction, RMSE of 0.3421 for gas furnace modelling and VAF% (mean±std) of 98.24±0.71 for the EMG modelling of all trial signals, indicating a significant enhancement over previous methods.
Conclusions
The FJWNN demonstrated promising accuracy and generalization while moderating network complexity. This improvement is due to applying main useful wavelets in combination with linear regressors and using fuzzy rule induction. Compared to the state-of-the-art models, the proposed FJWNN yielded better performance and, therefore, can be considered a novel tool for nonlinear system identificationPeer ReviewedPostprint (published version
Flood Forecasting Using Machine Learning Methods
This book is a printed edition of the Special Issue Flood Forecasting Using Machine Learning Methods that was published in Wate
Training Multilayer Perceptron with Genetic Algorithms and Particle Swarm Optimization for Modeling Stock Price Index Prediction
publishedVersio
Forecasting of uv-vis spectrometry time series for online water quality monitoring in operating urban sewer systems
El monitoreo de contaminantes en sistemas de saneamiento urbano es generalmente realizado por medio de campañas de muestreo, las muestras deben ser transportadas, almacenadas y analizadas en laboratorio. Sin embargo, los desarrollos en óptica y electrónica han permitido su fusión y aplicación en la espectrometría UV-Vis. Los sensores UV-Vis tienen como propósito determinar la dinámica de las cargas de materia orgánica (Demanda Química de Oxigeno DQO y Demanda Bioquímica de Oxigeno DBO5), nitratos, nitritos y Sólidos Suspendidos Totales (SST). Adicionalmente a los métodos aplicados para la calibración de los sensores y el análisis las series de tiempo de los espectros de absorbancias UV-Vis, es necesario desarrollar métodos de pronóstico con el fin de ser utilizada en control de monitoreo en línea en tiempo real. La información proveniente de los datos recolectados puede ser utilizada para la toma de decisiones y en aplicaciones de control de tiempo real. Realizar pronósticos es importante en procesos de toma de decisiones. Por lo tanto, el objetivo de este trabajo de investigación fue desarrollar uno o varios métodos de pronóstico que puedan ser aplicados a series de tiempo de espectrometría UV-Vis para el monitoreo en línea de la calidad de agua en sistemas urbanos de saneamiento en operación. Cinco series de tiempo de absorbancia UV-Vis obtenidas en línea en diferentes sitios fueron utilizadas, con un total de 5705 espectros de absorbancia UV-Vis: cuatro sitios experimentales en Colombia (Planta de Tratamiento de Aguas Residuales (PTAR) El-Salitre, PTAR San Fernando, Estación Elevadora de Gibraltar y un Humedal Construido/Tanque de Almacenamiento) y un sitio en Austria (Graz-West R05 Catchment outlet). El proceso propuesto completo consta de etapas a ser aplicadas a las series de tiempo de absorbancia UV-Vis y son: (i) entradas, series de tiempo de absorbancia UV-Vis,(ii) pre-procesamiento de las series de tiempo, análisis de outliers, completar los valores ausentes y reducción de la dimensionalidad,y (iii) procedimientos de pronóstico y evaluación de los resultados. La metodología propuesta fue aplicada a la series de tiempo con diferentes características (absorbancia), esta consiste del enventaneo Winsorising como paso para la remoción de outliers y la aplicación de la transformada discreta de Fourier (DFT) para reemplazar valores ausentes. Los nuevos valores reemplazando o los outliers o los valores ausentes presentan la misma o al menos la misma forma de la serie de tiempo original, permitiendo una visión macro en la coherencia de la serie de tiempo. La reducción de la dimensionalidad en las series de tiempo de absorbancia multivariadas permite obtener menor número de variables a ser procesadas: el análisis por componentes principales (PCA) como transformación lineal captura más del 97% de la variabilidad en cada serie de tiempo (en un rango de una a seis, dependiendo del comportamiento de la series de tiempo absorbancia) y el proceso de Clustering (k-means) combinado con cadenas de Markov. Los procedimientos de pronóstico basados en señales periódicas como la DFT, Chebyshev, Legendre y Regresión Polinomial fueron aplicados y estos pueden capturar el comportamiento dinámico de las series de tiempo. Algunas técnicas de aprendizaje de máquina fueron probadas y fue posible capturar el comportamiento de las series de tiempo en la etapa de calibración, los valores de pronóstico pueden seguir el comportamiento general comparado con los valores observados (excepto ANFIS, GA y Filtro de Kalman). Por lo tanto, ANN y SVM tiene buen rendimiento de pronóstico para la primer parte del horizonte de pronóstico (2 horas). La evaluación de cada metodología de pronóstico fue realizada utilizando cuatro indicadores estadísticos tales como porcentaje absoluto de error (APE), incertidumbre extendida (EU), conjunto de valores dentro del intervalo de confianza (CI) y suma de valores de incertidumbre extendida más el conjunto de valores dentro del intervalo de confianza. El rendimiento de los indicadores provee información acerca de los resultados de pronóstico multivariado con el fin de estimar y evaluar los tiempos de pronóstico para cierta metodología de pronóstico y determinar cuál metodología de pronóstico es mejor adaptada a diferentes rangos de longitudes de onda (espectros de absorbancia) para cada serie de tiempo de absorbancia UV-Vis en cada sitio de estudio. Los resultados en la comparación de las diferentes metodologías de pronóstico, resaltan que no es posible obtener la mejor metodología de pronóstico, porque todas las metodologías de pronóstico propuestas podrían generar un amplio número de valores que permitirán complementar cada una con las otras para diferentes pasos de tiempo de pronóstico y en diferentes rangos del espectro (UV y/o Vis). Por lo tanto, es propuesto un sistema híbrido que es basado en siete metodologías de pronóstico. Así, los valores de los espectros de absorbancia pronosticados fueron transformados a los correspondientes indicadores de calidad de agua (WQI) para utilización en la práctica. Los resultados de pronóstico multivariado presentan valores bajos de APE comparados con los resultados de pronóstico univariado utilizando directamente los valores WQI observados. Estos resultados, probablemente, son obtenidos porque el pronóstico multivariado incluye la correlación presente en todo el rango de los espectros de absorbancia (se captura de forma completa o al menos gran parte de la variabilidad de las series de tiempo),una longitud de onda interfiere con otra u otras longitudes de onda. Finalmente, los resultados obtenidos para el humedal construido/tanque de almacenamiento presentan que es posible obtener apreciables resultados de pronóstico en términos de tiempos de detección para eventos de lluvia. Adicionalmente, la inclusión de variables como escorrentía (nivel de agua para este caso) mejora substancialmente los resultados de pronóstico de la calidad del agua. El monitoreo de contaminantes en sistemas de saneamiento urbano es generalmente realizado por medio de campañas de muestreo, las muestras deben ser transportadas, almacenadas y analizadas en laboratorio. Sin embargo, los desarrollos en óptica y electrónica han permitido su fusión y aplicación en la espectrometría UV-Vis. Los sensores UV-Vis tienen como propósito determinar la dinámica de las cargas de materia orgánica (Demanda Química de Oxigeno DQO y Demanda Bioquímica de Oxigeno DBO5), nitratos, nitritos y Sólidos Suspendidos Totales (SST). Adicionalmente a los métodos aplicados para la calibración de los sensores y el análisis las series de tiempo de los espectros de absorbancias UV-Vis, es necesario desarrollar métodos de pronóstico con el fin de ser utilizada en control de monitoreo en línea en tiempo real. La información proveniente de los datos recolectados puede ser utilizada para la toma de decisiones y en aplicaciones de control de tiempo real. Realizar pronósticos es importante en procesos de toma de decisiones. Por lo tanto, el objetivo de este trabajo de investigación fue desarrollar uno o varios métodos de pronóstico que puedan ser aplicados a series de tiempo de espectrometría UV-Vis para el monitoreo en línea de la calidad de agua en sistemas urbanos de saneamiento en operación. Cinco series de tiempo de absorbancia UV-Vis obtenidas en línea en diferentes sitios fueron utilizadas, con un total de 5705 espectros de absorbancia UV-Vis: cuatro sitios experimentales en Colombia (Planta de Tratamiento de Aguas Residuales (PTAR) El-Salitre, PTAR San Fernando, Estación Elevadora de Gibraltar y un Humedal Construido/Tanque de Almacenamiento) y un sitio en Austria (Graz-West R05 Catchment outlet). El proceso propuesto completo consta de etapas a ser aplicadas a las series de tiempo de absorbancia UV-Vis y son: (i) entradas, series de tiempo de absorbancia UV-Vis,(ii) pre-procesamiento de las series de tiempo, análisis de outliers, completar los valores ausentes y reducción de la dimensionalidad,y (iii) procedimientos de pronóstico y evaluación de los resultados. La metodología propuesta fue aplicada a la series de tiempo con diferentes características (absorbancia), esta consiste del enventaneo Winsorising como paso para la remoción de outliers y la aplicación de la transformada discreta de Fourier (DFT) para reemplazar valores ausentes. Los nuevos valores reemplazando o los outliers o los valores ausentes presentan la misma o al menos la misma forma de la serie de tiempo original, permitiendo una visión macro en la coherencia de la serie de tiempo. La reducción de la dimensionalidad en las series de tiempo de absorbancia multivariadas permite obtener menor número de variables a ser procesadas: el análisis por componentes principales (PCA) como transformación lineal captura más del 97% de la variabilidad en cada serie de tiempo (en un rango de una a seis, dependiendo del comportamiento de la series de tiempo absorbancia) y el proceso de Clustering (k-means) combinado con cadenas de Markov. Los procedimientos de pronóstico basados en señales periódicas como la DFT, Chebyshev, Legendre y Regresión Polinomial fueron aplicados y estos pueden capturar el comportamiento dinámico de las series de tiempo. Algunas técnicas de aprendizaje de máquina fueron probadas y fue posible capturar el comportamiento de las series de tiempo en la etapa de calibración, los valores de pronóstico pueden seguir el comportamiento general comparado con los valores observados (excepto ANFIS, GA y Filtro de Kalman). Por lo tanto, ANN y SVM tiene buen rendimiento de pronóstico para la primer parte del horizonte de pronóstico (2 horas). La evaluación de cada metodología de pronóstico fue realizada utilizando cuatro indicadores estadísticos tales como porcentaje absoluto de error (APE), incertidumbre extendida (EU), conjunto de valores dentro del intervalo de confianza (CI) y suma de valores de incertidumbre extendida más el conjunto de valores dentro del intervalo de confianza. El rendimiento de los indicadores provee información acerca de los resultados de pronóstico multivariado con el fin de estimar y evaluar los tiempos de pronóstico para cierta metodología de pronóstico y determinar cuál metodología de pronóstico es mejor adaptada a diferentes rangos de longitudes de onda (espectros de absorbancia) para cada serie de tiempo de absorbancia UV-Vis en cada sitio de estudio. Los resultados en la comparación de las diferentes metodologías de pronóstico, resaltan que no es posible obtener la mejor metodología de pronóstico, porque todas las metodologías de pronóstico propuestas podrían generar un amplio número de valores que permitirán complementar cada una con las otras para diferentes pasos de tiempo de pronóstico y en diferentes rangos del espectro (UV y/o Vis). Por lo tanto, es propuesto un sistema híbrido que es basado en siete metodologías de pronóstico. Así, los valores de los espectros de absorbancia pronosticados fueron transformados a los correspondientes indicadores de calidad de agua (WQI) para utilización en la práctica. Los resultados de pronóstico multivariado presentan valores bajos de APE comparados con los resultados de pronóstico univariado utilizando directamente los valores WQI observados. Estos resultados, probablemente, son obtenidos porque el pronóstico multivariado incluye la correlación presente en todo el rango de los espectros de absorbancia (se captura de forma completa o al menos gran parte de la variabilidad de las series de tiempo),una longitud de onda interfiere con otra u otras longitudes de onda. Finalmente, los resultados obtenidos para el humedal construido/tanque de almacenamiento presentan que es posible obtener apreciables resultados de pronóstico en términos de tiempos de detección para eventos de lluvia. Adicionalmente, la inclusión de variables como escorrentía (nivel de agua para este caso) mejora substancialmente los resultados de pronóstico de la calidad del agua.The monitoring of pollutants in urban sewer systems is generally conducted by sampling campaigns, and the resulting samples must be transported, stored and analyzed in laboratory. However, the developments in optics and electronics have enabled the merge of them into the UV-Vis Spectrometry. UV-Vis probes have the purpose of determining the dynamics of loads of organic materials (i.e. Chemical Oxygen Demand (COD) and Biochemical Oxygen Demand (BOD5)), nitrates, nitrites and Total Suspended Solids (TSS). In addition to the methods used for the calibration of the probes and the analysis of the time series of UV-Vis absorbance spectra, it is necessary to develop forecasting methods in order to use the online control monitoring in real time. The information from the collected data can also be used for decision making purposes and for real-time control applications. Forecasting is important for decision-making processes. Therefore, the objective of this research work was to develop either a forecasting method or forecasting methods applied to UV-Vis spectrometry time series data for online water quality monitoring in operating urban sewer systems. Five UV-Vis Absorbance time series collected at different on-line measurement sites were used, for a total of 5705 UV-Vis absorbance spectra data: four sites in Colombia (El-Salitre Wastewater Treatment Plant-WWTP, San Fernando WWTP, Pumping Station (PS) sewage called Gibraltar and constructed-wetland/reservoir-tank (CWRT)) and one site in Austria (Graz-West R05 Catchment outlet). The complete process proposed to be applied to UV-Vis absorbance time series has several stages and these are: (i) inputs, the UV-Vis absorbance time series,(ii) the time series pre-processing, outliers analysis, complete missing values and time series dimensionality reduction,and (iii) forecasting procedures and evaluation of results. The methodology proposed was applied to the time series with different characteristics (absorbance), this consists of Winsorising as a step in outlier removal and the application of the Discrete Fourier Transform (DFT) to complete the missing values. The new values replaced either outliers or missing values present the same, or almost the same, shape as the original time series, granted the macro vision of the time series coherence. Dimensionality reduction of multivariate absorbance time series allows to have less variables to be processed: PCA linear transformation captures more than 97% of variability for each time series (PC ranging from one to six, depending on absorbance time series behavior), and Clustering process (k-means) combined with Markov Chains. Forecasting procedures based on periodic signals as DFT, Chebyshev, Legendre and Polynomial Regression were applied and they can capture the dynamic behaviour of the time series. Several Machine Learning technics were tested and it was possible to capture the behaviour of the time series at calibration stage, the forecasting obtained valúes can follow the general behaviour compared with observed valúes (with exception of ANFIS, GA and Kalman Filter). Therefore, ANN and SVM have good forecasting performances for first part of forecasting horizon (2 hours). The evaluation of each forecasting methodology was done using four statistic indicators as Absolute Percentage Error (APE), Extended Uncertainty (EU), Set of observed values within Confidence Interval (CI) and sum of EU and Set of observed values within CI. The performance indicators provided valuable information about multivariate forecasting results to estimate and evaluate the forecasting time for a given forecasting methodology and determine which forecasting methodology is best suited for different wavelength ranges (absorbance spectra) at each study site s UV-Vis absorbance time series. Results from different comparison of several forecasting methodologies, highlight that there is not possibility to have a best forecasting methodology among the proposed ones, because all of them could provide a wide forecasting values that would complemented each other for different forecasting time steps and spectra range (UV and/or Vis). Therefore, it is proposed a hybrid system that is based on seven forecasting methodologies. Thus, the forecasted absorbance spectra were transformed to Water Quality Indicators (WQI) for practical uses. The multivariate forecasting results show lower APE values compared to the univariate forecasting results (APE values) using the observed WQI. These results, probably, were obtained because multivariate forecasting includes the correlation presented at whole absorbance spectra range (captures complete or at least great part of time series variability),one wavelength interferes with another and/or other wavelengths. Finally, the results obtained for a constructed-wetland/reservoir-tank system show that it is possible to obtain valuable forecasting results in terms of time detection for some rainfall events. In addition, the inclusion of runoff variables (water level in this case) improves the water quality forecasting results.Doctor en IngenieríaDoctorad
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find
relationships and to summarize the data in ways that are both understandable
and useful. Many DM techniques exist. Compared with other DM techniques,
Intelligent Systems (ISs) based approaches, which include Artificial Neural
Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free
optimization methods such as Genetic Algorithms (GAs), are tolerant of
imprecision, uncertainty, partial truth, and approximation. They provide
flexible information processing capability for handling real-life situations. This
thesis is concerned with the ideas behind design, implementation, testing and
application of a novel ISs based DM technique. The unique contribution of this
thesis is in the implementation of a hybrid IS DM technique (Genetic Neural
Mathematical Method, GNMM) for solving novel practical problems, the
detailed description of this technique, and the illustrations of several
applications solved by this novel technique.
GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi-
Layer Perceptron (MLP) modelling, and (3) mathematical programming based
rule extraction. In the first step, GAs are used to evolve an optimal set of MLP
inputs. An adaptive method based on the average fitness of successive
generations is used to adjust the mutation rate, and hence the
exploration/exploitation balance. In addition, GNMM uses the elite group and
appearance percentage to minimize the randomness associated with GAs. In
the second step, MLP modelling serves as the core DM engine in performing
classification/prediction tasks. An Independent Component Analysis (ICA)
based weight initialization algorithm is used to determine optimal weights
before the commencement of training algorithms. The Levenberg-Marquardt
(LM) algorithm is used to achieve a second-order speedup compared to
conventional Back-Propagation (BP) training. In the third step, mathematical
programming based rule extraction is not only used to identify the premises of
multivariate polynomial rules, but also to explore features from the extracted
rules based on data samples associated with each rule. Therefore, the
methodology can provide regression rules and features not only in the
polyhedrons with data instances, but also in the polyhedrons without data
instances.
A total of six datasets from environmental and medical disciplines were used
as case study applications. These datasets involve the prediction of
longitudinal dispersion coefficient, classification of electrocorticography
(ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data
Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness,
but the emphasis is different for different datasets. For example, the emphasis
of Data I and II was to give a detailed illustration of how GNMM works; Data III
and IV aimed to show how to deal with difficult classification problems; the
aim of Data V was to illustrate the averaging effect of GNMM; and finally Data
VI was concerned with the GA parameter selection and benchmarking GNMM
with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System
(ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and
Cartesian Genetic Programming (CGP). In addition, datasets obtained from
published works (i.e. Data II & III) or public domains (i.e. Data VI) where
previous results were present in the literature were also used to benchmark
GNMM’s effectiveness.
As a closely integrated system GNMM has the merit that it needs little human
interaction. With some predefined parameters, such as GA’s crossover
probability and the shape of ANNs’ activation functions, GNMM is able to
process raw data until some human-interpretable rules being extracted. This is
an important feature in terms of practice as quite often users of a DM system
have little or no need to fully understand the internal components of such a
system. Through case study applications, it has been shown that the GA-based
variable selection stage is capable of: filtering out irrelevant and noisy
variables, improving the accuracy of the model; making the ANN structure less
complex and easier to understand; and reducing the computational complexity
and memory requirements. Furthermore, rule extraction ensures that the MLP
training results are easily understandable and transferrable
A Review of Hybrid Soft Computing and Data Pre-Processing Techniques to Forecast Freshwater Quality’s Parameters: Current Trends and Future Directions
Water quality has a significant influence on human health. As a result, water quality parameter modelling is one of the most challenging problems in the water sector. Therefore, the major factor in choosing an appropriate prediction model is accuracy. This research aims to analyse hybrid techniques and pre-processing data methods in freshwater quality modelling and forecasting. Hybrid approaches have generally been seen as a potential way of improving the accuracy of water quality modelling and forecasting compared with individual models. Consequently, recent studies have focused on using hybrid models to enhance forecasting accuracy. The modelling of dissolved oxygen is receiving more attention. From a review of relevant articles, it is clear that hybrid techniques are viable and precise methods for water quality prediction. Additionally, this paper presents future research directions to help researchers predict freshwater quality variables
Estimating the concentration of physico chemical parameters in hydroelectric power plant reservoir
The United Nations Educational, Scientific and Cultural Organization (UNESCO) defines
the amazon region and adjacent areas, such as the Pantanal, as world heritage territories, since
they possess unique flora and fauna and great biodiversity. Unfortunately, these regions have
increasingly been suffering from anthropogenic impacts. One of the main anthropogenic impacts
in the last decades has been the construction of hydroelectric power plants.
As a result, dramatic altering of these ecosystems has been observed, including changes in
water levels, decreased oxygenation and loss of downstream organic matter, with consequent
intense land use and population influxes after the filling and operation of these reservoirs. This,
in turn, leads to extreme loss of biodiversity in these areas, due to the large-scale deforestation.
The fishing industry in place before construction of dams and reservoirs, for example, has become
much more intense, attracting large populations in search of work, employment and income.
Environmental monitoring is fundamental for reservoir management, and several studies
around the world have been performed in order to evaluate the water quality of these ecosystems.
The Brazilian Amazon, in particular, goes through well defined annual hydrological cycles, which
are very importante since their study aids in monitoring anthropogenic environmental impacts
and can lead to policy and decision making with regard to environmental management of this
area. The water quality of amazon reservoirs is greatly influenced by this defined hydrological
cycle, which, in turn, causes variations of microbiological, physical and chemical characteristics.
Eutrophication, one of the main processes leading to water deterioration in lentic environments,
is mostly caused by anthropogenic activities, such as the releases of industrial and domestic
effluents into water bodies.
Physico-chemical water parameters typically related to eutrophication are, among others,
chlorophyll-a levels, transparency and total suspended solids, which can, thus, be used to assess
the eutrophic state of water bodies.
Usually, these parameters must be investigated by going out to the field and manually
measuring water transparency with the use of a Secchi disk, and taking water samples to the
laboratory in order to obtain chlorophyll-a and total suspended solid concentrations. These
processes are time- consuming and require trained personnel. However, we have proposed other
techniques to environmental monitoring studies which do not require fieldwork, such as remote
sensing and computational intelligence.
Simulations in different reservoirs were performed to determine a relationship between these
physico-chemical parameters and the spectral response. Based on the in situ measurements,
empirical models were established to relate the reflectance of the reservoir measured by the
satellites. The images were calibrated and corrected atmospherically.
Statistical analysis using error estimation was used to evaluate the most accurate methodology.
The Neural Networks were trained by hydrological cycle, and were useful to estimate the physicalchemical
parameters of the water from the reflectance of visible bands and NIR of satellite images,
with better results for the period with few clouds in the regions analyzed.
The present study shows the application of wavelet neural network to estimate water quality
parameters using concentration of the water samples collected in the Amazon reservoir and Cefni
reservoir, UK. Sattelite imagens from Landsats and Sentinel-2 were used to train the ANN by
hydrological cycle.
The trained ANNs demonstrated good results between observed and estimated after Atmospheric
corrections in satellites images. The ANNs showed in the results are useful to estimate
these concentrations using remote sensing and wavelet transform for image processing.
Therefore, the techniques proposed and applied in the present study are noteworthy since
they can aid in evaluating important physico-chemical parameters, which, in turn, allows for identification of possible anthropogenic impacts, being relevant in environmental management
and policy decision-making processes.
The tests results showed that the predicted values have good accurate. Improving efficiency
to monitor water quality parameters and confirm the reliability and accuracy of the approaches
proposed for monitoring water reservoirs.
This thesis contributes to the evaluation of the accuracy of different methods in the estimation
of physical-chemical parameters, from satellite images and artificial neural networks. For future
work, the accuracy of the results can be improved by adding more satellite images and testing
new neural networks with applications in new water reservoirs
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo
Multi-agent system for flood forecasting in Tropical River Basin
It is well known, the problems related to the generation of floods, their control, and management,
have been treated with traditional hydrologic modeling tools focused on the study and
the analysis of the precipitation-runoff relationship, a physical process which is driven by the
hydrological cycle and the climate regime and that is directly proportional to the generation
of floodwaters. Within the hydrological discipline, they classify these traditional modeling
tools according to three principal groups, being the first group defined as trial-and-error models
(e.g., "black-models"), the second group are the conceptual models, which are categorized
in three main sub-groups as "lumped", "semi-lumped" and "semi-distributed", according to
the special distribution, and finally, models that are based on physical processes, known as
"white-box models" are the so-called "distributed-models". On the other hand, in engineering
applications, there are two types of models used in streamflow forecasting, and which are
classified concerning the type of measurements and variables required as "physically based
models", as well as "data-driven models".
The Physically oriented prototypes present an in-depth account of the dynamics related
to the physical aspects that occur internally among the different systems of a given hydrographic
basin. However, aside from being laborious to implement, they rely thoroughly
on mathematical algorithms, and an understanding of these interactions requires the abstraction
of mathematical concepts and the conceptualization of the physical processes that
are intertwined among these systems. Besides, models determined by data necessitates an
a-priori understanding of the physical laws controlling the process within the system, and
they are bound to mathematical formulations, which require a lot of numeric information
for field adjustments. Therefore, these models are remarkably different from each other
because of their needs for data, and their interpretation of physical phenomena. Although
there is considerable progress in hydrologic modeling for flood forecasting, several significant
setbacks remain unresolved, given the stochastic nature of the hydrological phenomena, is
the challenge to implement user-friendly, re-usable, robust, and reliable forecasting systems,
the amount of uncertainty they must deal with when trying to solve the flood forecasting
problem. However, in the past decades, with the growing environment and development of
the artificial intelligence (AI) field, some researchers have seldomly attempted to deal with
the stochastic nature of hydrologic events with the application of some of these techniques.
Given the setbacks to hydrologic flood forecasting previously described this thesis research
aims to integrate the physics-based hydrologic, hydraulic, and data-driven models under the
paradigm of Multi-agent Systems for flood forecasting by designing and developing a multi-agent system (MAS) framework for flood forecasting events within the scope of tropical
watersheds.
With the emergence of the agent technologies, the "agent-based modeling" and "multiagent
systems" simulation methods have provided applications for some areas of hydro base
management like flood protection, planning, control, management, mitigation, and forecasting
to combat the shocks produced by floods on society; however, all these focused on
evacuation drills, and the latter not aimed at the tropical river basin, whose hydrological
regime is extremely unique.
In this catchment modeling environment approach, it was applied the multi-agent systems
approach as a surrogate of the conventional hydrologic model to build a system that operates
at the catchment level displayed with hydrometric stations, that use the data from hydrometric
sensors networks (e.g., rainfall, river stage, river flow) captured, stored and administered
by an organization of interacting agents whose main aim is to perform flow forecasting and
awareness, and in so doing enhance the policy-making process at the watershed level.
Section one of this document surveys the status of the current research in hydrologic
modeling for the flood forecasting task. It is a journey through the background of related
concerns to the hydrological process, flood ontologies, management, and forecasting. The
section covers, to a certain extent, the techniques, methods, and theoretical aspects and
methods of hydrological modeling and their types, from the conventional models to the
present-day artificial intelligence prototypes, making special emphasis on the multi-agent
systems, as most recent modeling methodology in the hydrological sciences. However, it is
also underlined here that the section does not contribute to an all-inclusive revision, rather
its purpose is to serve as a framework for this sort of work and a path to underline the
significant aspects of the works.
In section two of the document, it is detailed the conceptual framework for the suggested
Multiagent system in support of flood forecasting. To accomplish this task, several works
need to be carried out such as the sketching and implementation of the system’s framework
with the (Belief-Desire-Intention model) architecture for flood forecasting events within the
concept of the tropical river basin. Contributions of this proposed architecture are the
replacement of the conventional hydrologic modeling with the use of multi-agent systems,
which makes it quick for hydrometric time-series data administration and modeling of the
precipitation-runoff process which conveys to flood in a river course. Another advantage is
the user-friendly environment provided by the proposed multi-agent system platform graphical
interface, the real-time generation of graphs, charts, and monitors with the information
on the immediate event taking place in the catchment, which makes it easy for the viewer
with some or no background in data analysis and their interpretation to get a visual idea of
the information at hand regarding the flood awareness.
The required agents developed in this multi-agent system modeling framework for flood
forecasting have been trained, tested, and validated under a series of experimental tasks,
using the hydrometric series information of rainfall, river stage, and streamflow data collected
by the hydrometric sensor agents from the hydrometric sensors.Como se sabe, los problemas relacionados con la generación de inundaciones, su control y
manejo, han sido tratados con herramientas tradicionales de modelado hidrológico enfocados
al estudio y análisis de la relación precipitación-escorrentía, proceso físico que es impulsado
por el ciclo hidrológico y el régimen climático y este esta directamente proporcional a la
generación de crecidas. Dentro de la disciplina hidrológica, clasifican estas herramientas
de modelado tradicionales en tres grupos principales, siendo el primer grupo el de modelos
empíricos (modelos de caja negra), modelos conceptuales (o agrupados, semi-agrupados o
semi-distribuidos) dependiendo de la distribución espacial y, por último, los basados en la
física, modelos de proceso (o "modelos de caja blanca", y/o distribuidos). En este sentido,
clasifican las aplicaciones de predicción de caudal fluvial en la ingeniería de recursos hídricos
en dos tipos con respecto a los valores y parámetros que requieren en: modelos de procesos
basados en la física y la categoría de modelos impulsados por datos.
Los modelos basados en la física proporcionan una descripción detallada de la dinámica
relacionada con los aspectos físicos que ocurren internamente entre los diferentes sistemas de
una cuenca hidrográfica determinada. Sin embargo, aparte de ser complejos de implementar,
se basan completamente en algoritmos matemáticos, y la comprensión de estas interacciones
requiere la abstracción de conceptos matemáticos y la conceptualización de los procesos
físicos que se entrelazan entre estos sistemas. Además, los modelos impulsados por datos no
requieren conocimiento de los procesos físicos que gobiernan, sino que se basan únicamente
en ecuaciones empíricas que necesitan una gran cantidad de datos y requieren calibración
de los datos en el sitio. Los dos modelos difieren significativamente debido a sus requisitos
de datos y de cómo expresan los fenómenos físicos. La elaboración de modelos hidrológicos
para el pronóstico de inundaciones ha dado grandes pasos, pero siguen sin resolverse algunos
contratiempos importantes, dada la naturaleza estocástica de los fenómenos hidrológicos, es
el desafío de implementar sistemas de pronóstico fáciles de usar, reutilizables, robustos y
confiables, la cantidad de incertidumbre que deben afrontar al intentar resolver el problema
de la predicción de inundaciones. Sin embargo, en las últimas décadas, con el entorno
creciente y el desarrollo del campo de la inteligencia artificial (IA), algunos investigadores
rara vez han intentado abordar la naturaleza estocástica de los eventos hidrológicos con la
aplicación de algunas de estas técnicas.
Dados los contratiempos en el pronóstico de inundaciones hidrológicas descritos anteriormente,
esta investigación de tesis tiene como objetivo integrar los modelos hidrológicos,
basados en la física, hidráulicos e impulsados por datos bajo el paradigma de Sistemas de múltiples agentes para el pronóstico de inundaciones por medio del bosquejo y desarrollo
del marco de trabajo del sistema multi-agente (MAS) para los eventos de predicción de
inundaciones en el contexto de cuenca hidrográfica tropical.
Con la aparición de las tecnologías de agentes, se han emprendido algunos enfoques
de simulación recientes en la investigación hidrológica con modelos basados en agentes y
sistema multi-agente, principalmente en alerta por inundaciones, seguridad y planificación
de inundaciones, control y gestión de inundaciones y pronóstico de inundaciones, todos estos
enfocado a simulacros de evacuación, y este último no dirigido a la cuenca tropical, cuyo
régimen hidrológico es extremadamente único.
En este enfoque de entorno de modelado de cuencas, se aplican los enfoques de sistemas
multi-agente como un sustituto del modelado hidrológico convencional para construir un
sistema que opera a nivel de cuenca con estaciones hidrométricas desplegadas, que utilizan
los datos de redes de sensores hidrométricos (por ejemplo, lluvia , nivel del río, caudal del
río) capturado, almacenado y administrado por una organización de agentes interactuantes
cuyo objetivo principal es realizar pronósticos de caudal y concientización para mejorar las
capacidades de soporte en la formulación de políticas a nivel de cuenca hidrográfica.
La primera sección de este documento analiza el estado del arte sobre la investigación actual
en modelos hidrológicos para la tarea de pronóstico de inundaciones. Es un viaje a través
de los antecedentes preocupantes relacionadas con el proceso hidrológico, las ontologías de
inundaciones, la gestión y la predicción. El apartado abarca, en cierta medida, las técnicas,
métodos y aspectos teóricos y métodos del modelado hidrológico y sus tipologías, desde
los modelos convencionales hasta los prototipos de inteligencia artificial actuales, haciendo
hincapié en los sistemas multi-agente, como un enfoque de simulación reciente en la investigación
hidrológica. Sin embargo, se destaca que esta sección no contribuye a una revisión
integral, sino que su propósito es servir de marco para este tipo de trabajos y una guía para
subrayar los aspectos significativos de los trabajos.
En la sección dos del documento, se detalla el marco de trabajo propuesto para el sistema
multi-agente para el pronóstico de inundaciones. Los trabajos realizados comprendieron el
diseño y desarrollo del marco de trabajo del sistema multi-agente con la arquitectura (modelo
Creencia-Deseo-Intención) para la predicción de eventos de crecidas dentro del concepto
de cuenca hidrográfica tropical. Las contribuciones de esta arquitectura propuesta son el
reemplazo del modelado hidrológico convencional con el uso de sistemas multi-agente, lo
que agiliza la administración de las series de tiempo de datos hidrométricos y el modelado
del proceso de precipitación-escorrentía que conduce a la inundación en el curso de un río.
Otra ventaja es el entorno amigable proporcionado por la interfaz gráfica de la plataforma del
sistema multi-agente propuesto, la generación en tiempo real de gráficos, cuadros y monitores
con la información sobre el evento inmediato que tiene lugar en la cuenca, lo que lo hace
fácil para el espectador con algo o sin experiencia en análisis de datos y su interpretación
para tener una idea visual de la información disponible con respecto a la cognición de las
inundaciones.
Los agentes necesarios desarrollados en este marco de modelado de sistemas multi-agente
para el pronóstico de inundaciones han sido entrenados, probados y validados en una serie de tareas experimentales, utilizando la información de la serie hidrométrica de datos de lluvia,
nivel del río y flujo del curso de agua recolectados por los agentes sensores hidrométricos de
los sensores hidrométricos de campo.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: María Araceli Sanchis de Miguel.- Secretario: Juan Gómez Romero.- Vocal: Juan Carlos Corrale
- …