387 research outputs found

    Fuzzy jump wavelet neural network based on rule induction for dynamic nonlinear system identification with real data applications

    Get PDF
    Aim Fuzzy wavelet neural network (FWNN) has proven to be a promising strategy in the identification of nonlinear systems. The network considers both global and local properties, deals with imprecision present in sensory data, leading to desired precisions. In this paper, we proposed a new FWNN model nominated “Fuzzy Jump Wavelet Neural Network” (FJWNN) for identifying dynamic nonlinear-linear systems, especially in practical applications. Methods The proposed FJWNN is a fuzzy neural network model of the Takagi-Sugeno-Kang type whose consequent part of fuzzy rules is a linear combination of input regressors and dominant wavelet neurons as a sub-jump wavelet neural network. Each fuzzy rule can locally model both linear and nonlinear properties of a system. The linear relationship between the inputs and the output is learned by neurons with linear activation functions, whereas the nonlinear relationship is locally modeled by wavelet neurons. Orthogonal least square (OLS) method and genetic algorithm (GA) are respectively used to purify the wavelets for each sub-JWNN. In this paper, fuzzy rule induction improves the structure of the proposed model leading to less fuzzy rules, inputs of each fuzzy rule and model parameters. The real-world gas furnace and the real electromyographic (EMG) signal modeling problem are employed in our study. In the same vein, piecewise single variable function approximation, nonlinear dynamic system modeling, and Mackey–Glass time series prediction, ratify this method superiority. The proposed FJWNN model is compared with the state-of-the-art models based on some performance indices such as RMSE, RRSE, Rel ERR%, and VAF%. Results The proposed FJWNN model yielded the following results: RRSE (mean±std) of 10e-5±6e-5 for piecewise single-variable function approximation, RMSE (mean±std) of 2.6–4±2.6e-4 for the first nonlinear dynamic system modelling, RRSE (mean±std) of 1.59e-3±0.42e-3 for Mackey–Glass time series prediction, RMSE of 0.3421 for gas furnace modelling and VAF% (mean±std) of 98.24±0.71 for the EMG modelling of all trial signals, indicating a significant enhancement over previous methods. Conclusions The FJWNN demonstrated promising accuracy and generalization while moderating network complexity. This improvement is due to applying main useful wavelets in combination with linear regressors and using fuzzy rule induction. Compared to the state-of-the-art models, the proposed FJWNN yielded better performance and, therefore, can be considered a novel tool for nonlinear system identificationPeer ReviewedPostprint (published version

    Flood Forecasting Using Machine Learning Methods

    Get PDF
    This book is a printed edition of the Special Issue Flood Forecasting Using Machine Learning Methods that was published in Wate

    Forecasting of uv-vis spectrometry time series for online water quality monitoring in operating urban sewer systems

    Get PDF
    El monitoreo de contaminantes en sistemas de saneamiento urbano es generalmente realizado por medio de campañas de muestreo, las muestras deben ser transportadas, almacenadas y analizadas en laboratorio. Sin embargo, los desarrollos en óptica y electrónica han permitido su fusión y aplicación en la espectrometría UV-Vis. Los sensores UV-Vis tienen como propósito determinar la dinámica de las cargas de materia orgánica (Demanda Química de Oxigeno DQO y Demanda Bioquímica de Oxigeno DBO5), nitratos, nitritos y Sólidos Suspendidos Totales (SST). Adicionalmente a los métodos aplicados para la calibración de los sensores y el análisis las series de tiempo de los espectros de absorbancias UV-Vis, es necesario desarrollar métodos de pronóstico con el fin de ser utilizada en control de monitoreo en línea en tiempo real. La información proveniente de los datos recolectados puede ser utilizada para la toma de decisiones y en aplicaciones de control de tiempo real. Realizar pronósticos es importante en procesos de toma de decisiones. Por lo tanto, el objetivo de este trabajo de investigación fue desarrollar uno o varios métodos de pronóstico que puedan ser aplicados a series de tiempo de espectrometría UV-Vis para el monitoreo en línea de la calidad de agua en sistemas urbanos de saneamiento en operación. Cinco series de tiempo de absorbancia UV-Vis obtenidas en línea en diferentes sitios fueron utilizadas, con un total de 5705 espectros de absorbancia UV-Vis: cuatro sitios experimentales en Colombia (Planta de Tratamiento de Aguas Residuales (PTAR) El-Salitre, PTAR San Fernando, Estación Elevadora de Gibraltar y un Humedal Construido/Tanque de Almacenamiento) y un sitio en Austria (Graz-West R05 Catchment outlet). El proceso propuesto completo consta de etapas a ser aplicadas a las series de tiempo de absorbancia UV-Vis y son: (i) entradas, series de tiempo de absorbancia UV-Vis,(ii) pre-procesamiento de las series de tiempo, análisis de outliers, completar los valores ausentes y reducción de la dimensionalidad,y (iii) procedimientos de pronóstico y evaluación de los resultados. La metodología propuesta fue aplicada a la series de tiempo con diferentes características (absorbancia), esta consiste del enventaneo Winsorising como paso para la remoción de outliers y la aplicación de la transformada discreta de Fourier (DFT) para reemplazar valores ausentes. Los nuevos valores reemplazando o los outliers o los valores ausentes presentan la misma o al menos la misma forma de la serie de tiempo original, permitiendo una visión macro en la coherencia de la serie de tiempo. La reducción de la dimensionalidad en las series de tiempo de absorbancia multivariadas permite obtener menor número de variables a ser procesadas: el análisis por componentes principales (PCA) como transformación lineal captura más del 97% de la variabilidad en cada serie de tiempo (en un rango de una a seis, dependiendo del comportamiento de la series de tiempo absorbancia) y el proceso de Clustering (k-means) combinado con cadenas de Markov. Los procedimientos de pronóstico basados en señales periódicas como la DFT, Chebyshev, Legendre y Regresión Polinomial fueron aplicados y estos pueden capturar el comportamiento dinámico de las series de tiempo. Algunas técnicas de aprendizaje de máquina fueron probadas y fue posible capturar el comportamiento de las series de tiempo en la etapa de calibración, los valores de pronóstico pueden seguir el comportamiento general comparado con los valores observados (excepto ANFIS, GA y Filtro de Kalman). Por lo tanto, ANN y SVM tiene buen rendimiento de pronóstico para la primer parte del horizonte de pronóstico (2 horas). La evaluación de cada metodología de pronóstico fue realizada utilizando cuatro indicadores estadísticos tales como porcentaje absoluto de error (APE), incertidumbre extendida (EU), conjunto de valores dentro del intervalo de confianza (CI) y suma de valores de incertidumbre extendida más el conjunto de valores dentro del intervalo de confianza. El rendimiento de los indicadores provee información acerca de los resultados de pronóstico multivariado con el fin de estimar y evaluar los tiempos de pronóstico para cierta metodología de pronóstico y determinar cuál metodología de pronóstico es mejor adaptada a diferentes rangos de longitudes de onda (espectros de absorbancia) para cada serie de tiempo de absorbancia UV-Vis en cada sitio de estudio. Los resultados en la comparación de las diferentes metodologías de pronóstico, resaltan que no es posible obtener la mejor metodología de pronóstico, porque todas las metodologías de pronóstico propuestas podrían generar un amplio número de valores que permitirán complementar cada una con las otras para diferentes pasos de tiempo de pronóstico y en diferentes rangos del espectro (UV y/o Vis). Por lo tanto, es propuesto un sistema híbrido que es basado en siete metodologías de pronóstico. Así, los valores de los espectros de absorbancia pronosticados fueron transformados a los correspondientes indicadores de calidad de agua (WQI) para utilización en la práctica. Los resultados de pronóstico multivariado presentan valores bajos de APE comparados con los resultados de pronóstico univariado utilizando directamente los valores WQI observados. Estos resultados, probablemente, son obtenidos porque el pronóstico multivariado incluye la correlación presente en todo el rango de los espectros de absorbancia (se captura de forma completa o al menos gran parte de la variabilidad de las series de tiempo),una longitud de onda interfiere con otra u otras longitudes de onda. Finalmente, los resultados obtenidos para el humedal construido/tanque de almacenamiento presentan que es posible obtener apreciables resultados de pronóstico en términos de tiempos de detección para eventos de lluvia. Adicionalmente, la inclusión de variables como escorrentía (nivel de agua para este caso) mejora substancialmente los resultados de pronóstico de la calidad del agua. El monitoreo de contaminantes en sistemas de saneamiento urbano es generalmente realizado por medio de campañas de muestreo, las muestras deben ser transportadas, almacenadas y analizadas en laboratorio. Sin embargo, los desarrollos en óptica y electrónica han permitido su fusión y aplicación en la espectrometría UV-Vis. Los sensores UV-Vis tienen como propósito determinar la dinámica de las cargas de materia orgánica (Demanda Química de Oxigeno DQO y Demanda Bioquímica de Oxigeno DBO5), nitratos, nitritos y Sólidos Suspendidos Totales (SST). Adicionalmente a los métodos aplicados para la calibración de los sensores y el análisis las series de tiempo de los espectros de absorbancias UV-Vis, es necesario desarrollar métodos de pronóstico con el fin de ser utilizada en control de monitoreo en línea en tiempo real. La información proveniente de los datos recolectados puede ser utilizada para la toma de decisiones y en aplicaciones de control de tiempo real. Realizar pronósticos es importante en procesos de toma de decisiones. Por lo tanto, el objetivo de este trabajo de investigación fue desarrollar uno o varios métodos de pronóstico que puedan ser aplicados a series de tiempo de espectrometría UV-Vis para el monitoreo en línea de la calidad de agua en sistemas urbanos de saneamiento en operación. Cinco series de tiempo de absorbancia UV-Vis obtenidas en línea en diferentes sitios fueron utilizadas, con un total de 5705 espectros de absorbancia UV-Vis: cuatro sitios experimentales en Colombia (Planta de Tratamiento de Aguas Residuales (PTAR) El-Salitre, PTAR San Fernando, Estación Elevadora de Gibraltar y un Humedal Construido/Tanque de Almacenamiento) y un sitio en Austria (Graz-West R05 Catchment outlet). El proceso propuesto completo consta de etapas a ser aplicadas a las series de tiempo de absorbancia UV-Vis y son: (i) entradas, series de tiempo de absorbancia UV-Vis,(ii) pre-procesamiento de las series de tiempo, análisis de outliers, completar los valores ausentes y reducción de la dimensionalidad,y (iii) procedimientos de pronóstico y evaluación de los resultados. La metodología propuesta fue aplicada a la series de tiempo con diferentes características (absorbancia), esta consiste del enventaneo Winsorising como paso para la remoción de outliers y la aplicación de la transformada discreta de Fourier (DFT) para reemplazar valores ausentes. Los nuevos valores reemplazando o los outliers o los valores ausentes presentan la misma o al menos la misma forma de la serie de tiempo original, permitiendo una visión macro en la coherencia de la serie de tiempo. La reducción de la dimensionalidad en las series de tiempo de absorbancia multivariadas permite obtener menor número de variables a ser procesadas: el análisis por componentes principales (PCA) como transformación lineal captura más del 97% de la variabilidad en cada serie de tiempo (en un rango de una a seis, dependiendo del comportamiento de la series de tiempo absorbancia) y el proceso de Clustering (k-means) combinado con cadenas de Markov. Los procedimientos de pronóstico basados en señales periódicas como la DFT, Chebyshev, Legendre y Regresión Polinomial fueron aplicados y estos pueden capturar el comportamiento dinámico de las series de tiempo. Algunas técnicas de aprendizaje de máquina fueron probadas y fue posible capturar el comportamiento de las series de tiempo en la etapa de calibración, los valores de pronóstico pueden seguir el comportamiento general comparado con los valores observados (excepto ANFIS, GA y Filtro de Kalman). Por lo tanto, ANN y SVM tiene buen rendimiento de pronóstico para la primer parte del horizonte de pronóstico (2 horas). La evaluación de cada metodología de pronóstico fue realizada utilizando cuatro indicadores estadísticos tales como porcentaje absoluto de error (APE), incertidumbre extendida (EU), conjunto de valores dentro del intervalo de confianza (CI) y suma de valores de incertidumbre extendida más el conjunto de valores dentro del intervalo de confianza. El rendimiento de los indicadores provee información acerca de los resultados de pronóstico multivariado con el fin de estimar y evaluar los tiempos de pronóstico para cierta metodología de pronóstico y determinar cuál metodología de pronóstico es mejor adaptada a diferentes rangos de longitudes de onda (espectros de absorbancia) para cada serie de tiempo de absorbancia UV-Vis en cada sitio de estudio. Los resultados en la comparación de las diferentes metodologías de pronóstico, resaltan que no es posible obtener la mejor metodología de pronóstico, porque todas las metodologías de pronóstico propuestas podrían generar un amplio número de valores que permitirán complementar cada una con las otras para diferentes pasos de tiempo de pronóstico y en diferentes rangos del espectro (UV y/o Vis). Por lo tanto, es propuesto un sistema híbrido que es basado en siete metodologías de pronóstico. Así, los valores de los espectros de absorbancia pronosticados fueron transformados a los correspondientes indicadores de calidad de agua (WQI) para utilización en la práctica. Los resultados de pronóstico multivariado presentan valores bajos de APE comparados con los resultados de pronóstico univariado utilizando directamente los valores WQI observados. Estos resultados, probablemente, son obtenidos porque el pronóstico multivariado incluye la correlación presente en todo el rango de los espectros de absorbancia (se captura de forma completa o al menos gran parte de la variabilidad de las series de tiempo),una longitud de onda interfiere con otra u otras longitudes de onda. Finalmente, los resultados obtenidos para el humedal construido/tanque de almacenamiento presentan que es posible obtener apreciables resultados de pronóstico en términos de tiempos de detección para eventos de lluvia. Adicionalmente, la inclusión de variables como escorrentía (nivel de agua para este caso) mejora substancialmente los resultados de pronóstico de la calidad del agua.The monitoring of pollutants in urban sewer systems is generally conducted by sampling campaigns, and the resulting samples must be transported, stored and analyzed in laboratory. However, the developments in optics and electronics have enabled the merge of them into the UV-Vis Spectrometry. UV-Vis probes have the purpose of determining the dynamics of loads of organic materials (i.e. Chemical Oxygen Demand (COD) and Biochemical Oxygen Demand (BOD5)), nitrates, nitrites and Total Suspended Solids (TSS). In addition to the methods used for the calibration of the probes and the analysis of the time series of UV-Vis absorbance spectra, it is necessary to develop forecasting methods in order to use the online control monitoring in real time. The information from the collected data can also be used for decision making purposes and for real-time control applications. Forecasting is important for decision-making processes. Therefore, the objective of this research work was to develop either a forecasting method or forecasting methods applied to UV-Vis spectrometry time series data for online water quality monitoring in operating urban sewer systems. Five UV-Vis Absorbance time series collected at different on-line measurement sites were used, for a total of 5705 UV-Vis absorbance spectra data: four sites in Colombia (El-Salitre Wastewater Treatment Plant-WWTP, San Fernando WWTP, Pumping Station (PS) sewage called Gibraltar and constructed-wetland/reservoir-tank (CWRT)) and one site in Austria (Graz-West R05 Catchment outlet). The complete process proposed to be applied to UV-Vis absorbance time series has several stages and these are: (i) inputs, the UV-Vis absorbance time series,(ii) the time series pre-processing, outliers analysis, complete missing values and time series dimensionality reduction,and (iii) forecasting procedures and evaluation of results. The methodology proposed was applied to the time series with different characteristics (absorbance), this consists of Winsorising as a step in outlier removal and the application of the Discrete Fourier Transform (DFT) to complete the missing values. The new values replaced either outliers or missing values present the same, or almost the same, shape as the original time series, granted the macro vision of the time series coherence. Dimensionality reduction of multivariate absorbance time series allows to have less variables to be processed: PCA linear transformation captures more than 97% of variability for each time series (PC ranging from one to six, depending on absorbance time series behavior), and Clustering process (k-means) combined with Markov Chains. Forecasting procedures based on periodic signals as DFT, Chebyshev, Legendre and Polynomial Regression were applied and they can capture the dynamic behaviour of the time series. Several Machine Learning technics were tested and it was possible to capture the behaviour of the time series at calibration stage, the forecasting obtained valúes can follow the general behaviour compared with observed valúes (with exception of ANFIS, GA and Kalman Filter). Therefore, ANN and SVM have good forecasting performances for first part of forecasting horizon (2 hours). The evaluation of each forecasting methodology was done using four statistic indicators as Absolute Percentage Error (APE), Extended Uncertainty (EU), Set of observed values within Confidence Interval (CI) and sum of EU and Set of observed values within CI. The performance indicators provided valuable information about multivariate forecasting results to estimate and evaluate the forecasting time for a given forecasting methodology and determine which forecasting methodology is best suited for different wavelength ranges (absorbance spectra) at each study site s UV-Vis absorbance time series. Results from different comparison of several forecasting methodologies, highlight that there is not possibility to have a best forecasting methodology among the proposed ones, because all of them could provide a wide forecasting values that would complemented each other for different forecasting time steps and spectra range (UV and/or Vis). Therefore, it is proposed a hybrid system that is based on seven forecasting methodologies. Thus, the forecasted absorbance spectra were transformed to Water Quality Indicators (WQI) for practical uses. The multivariate forecasting results show lower APE values compared to the univariate forecasting results (APE values) using the observed WQI. These results, probably, were obtained because multivariate forecasting includes the correlation presented at whole absorbance spectra range (captures complete or at least great part of time series variability),one wavelength interferes with another and/or other wavelengths. Finally, the results obtained for a constructed-wetland/reservoir-tank system show that it is possible to obtain valuable forecasting results in terms of time detection for some rainfall events. In addition, the inclusion of runoff variables (water level in this case) improves the water quality forecasting results.Doctor en IngenieríaDoctorad

    Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications

    Get PDF
    Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II & III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable

    A Review of Hybrid Soft Computing and Data Pre-Processing Techniques to Forecast Freshwater Quality’s Parameters: Current Trends and Future Directions

    Get PDF
    Water quality has a significant influence on human health. As a result, water quality parameter modelling is one of the most challenging problems in the water sector. Therefore, the major factor in choosing an appropriate prediction model is accuracy. This research aims to analyse hybrid techniques and pre-processing data methods in freshwater quality modelling and forecasting. Hybrid approaches have generally been seen as a potential way of improving the accuracy of water quality modelling and forecasting compared with individual models. Consequently, recent studies have focused on using hybrid models to enhance forecasting accuracy. The modelling of dissolved oxygen is receiving more attention. From a review of relevant articles, it is clear that hybrid techniques are viable and precise methods for water quality prediction. Additionally, this paper presents future research directions to help researchers predict freshwater quality variables

    Estimating the concentration of physico chemical parameters in hydroelectric power plant reservoir

    Get PDF
    The United Nations Educational, Scientific and Cultural Organization (UNESCO) defines the amazon region and adjacent areas, such as the Pantanal, as world heritage territories, since they possess unique flora and fauna and great biodiversity. Unfortunately, these regions have increasingly been suffering from anthropogenic impacts. One of the main anthropogenic impacts in the last decades has been the construction of hydroelectric power plants. As a result, dramatic altering of these ecosystems has been observed, including changes in water levels, decreased oxygenation and loss of downstream organic matter, with consequent intense land use and population influxes after the filling and operation of these reservoirs. This, in turn, leads to extreme loss of biodiversity in these areas, due to the large-scale deforestation. The fishing industry in place before construction of dams and reservoirs, for example, has become much more intense, attracting large populations in search of work, employment and income. Environmental monitoring is fundamental for reservoir management, and several studies around the world have been performed in order to evaluate the water quality of these ecosystems. The Brazilian Amazon, in particular, goes through well defined annual hydrological cycles, which are very importante since their study aids in monitoring anthropogenic environmental impacts and can lead to policy and decision making with regard to environmental management of this area. The water quality of amazon reservoirs is greatly influenced by this defined hydrological cycle, which, in turn, causes variations of microbiological, physical and chemical characteristics. Eutrophication, one of the main processes leading to water deterioration in lentic environments, is mostly caused by anthropogenic activities, such as the releases of industrial and domestic effluents into water bodies. Physico-chemical water parameters typically related to eutrophication are, among others, chlorophyll-a levels, transparency and total suspended solids, which can, thus, be used to assess the eutrophic state of water bodies. Usually, these parameters must be investigated by going out to the field and manually measuring water transparency with the use of a Secchi disk, and taking water samples to the laboratory in order to obtain chlorophyll-a and total suspended solid concentrations. These processes are time- consuming and require trained personnel. However, we have proposed other techniques to environmental monitoring studies which do not require fieldwork, such as remote sensing and computational intelligence. Simulations in different reservoirs were performed to determine a relationship between these physico-chemical parameters and the spectral response. Based on the in situ measurements, empirical models were established to relate the reflectance of the reservoir measured by the satellites. The images were calibrated and corrected atmospherically. Statistical analysis using error estimation was used to evaluate the most accurate methodology. The Neural Networks were trained by hydrological cycle, and were useful to estimate the physicalchemical parameters of the water from the reflectance of visible bands and NIR of satellite images, with better results for the period with few clouds in the regions analyzed. The present study shows the application of wavelet neural network to estimate water quality parameters using concentration of the water samples collected in the Amazon reservoir and Cefni reservoir, UK. Sattelite imagens from Landsats and Sentinel-2 were used to train the ANN by hydrological cycle. The trained ANNs demonstrated good results between observed and estimated after Atmospheric corrections in satellites images. The ANNs showed in the results are useful to estimate these concentrations using remote sensing and wavelet transform for image processing. Therefore, the techniques proposed and applied in the present study are noteworthy since they can aid in evaluating important physico-chemical parameters, which, in turn, allows for identification of possible anthropogenic impacts, being relevant in environmental management and policy decision-making processes. The tests results showed that the predicted values have good accurate. Improving efficiency to monitor water quality parameters and confirm the reliability and accuracy of the approaches proposed for monitoring water reservoirs. This thesis contributes to the evaluation of the accuracy of different methods in the estimation of physical-chemical parameters, from satellite images and artificial neural networks. For future work, the accuracy of the results can be improved by adding more satellite images and testing new neural networks with applications in new water reservoirs

    Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications

    Get PDF
    Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo

    Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications

    Get PDF
    Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo

    Multi-agent system for flood forecasting in Tropical River Basin

    Get PDF
    It is well known, the problems related to the generation of floods, their control, and management, have been treated with traditional hydrologic modeling tools focused on the study and the analysis of the precipitation-runoff relationship, a physical process which is driven by the hydrological cycle and the climate regime and that is directly proportional to the generation of floodwaters. Within the hydrological discipline, they classify these traditional modeling tools according to three principal groups, being the first group defined as trial-and-error models (e.g., "black-models"), the second group are the conceptual models, which are categorized in three main sub-groups as "lumped", "semi-lumped" and "semi-distributed", according to the special distribution, and finally, models that are based on physical processes, known as "white-box models" are the so-called "distributed-models". On the other hand, in engineering applications, there are two types of models used in streamflow forecasting, and which are classified concerning the type of measurements and variables required as "physically based models", as well as "data-driven models". The Physically oriented prototypes present an in-depth account of the dynamics related to the physical aspects that occur internally among the different systems of a given hydrographic basin. However, aside from being laborious to implement, they rely thoroughly on mathematical algorithms, and an understanding of these interactions requires the abstraction of mathematical concepts and the conceptualization of the physical processes that are intertwined among these systems. Besides, models determined by data necessitates an a-priori understanding of the physical laws controlling the process within the system, and they are bound to mathematical formulations, which require a lot of numeric information for field adjustments. Therefore, these models are remarkably different from each other because of their needs for data, and their interpretation of physical phenomena. Although there is considerable progress in hydrologic modeling for flood forecasting, several significant setbacks remain unresolved, given the stochastic nature of the hydrological phenomena, is the challenge to implement user-friendly, re-usable, robust, and reliable forecasting systems, the amount of uncertainty they must deal with when trying to solve the flood forecasting problem. However, in the past decades, with the growing environment and development of the artificial intelligence (AI) field, some researchers have seldomly attempted to deal with the stochastic nature of hydrologic events with the application of some of these techniques. Given the setbacks to hydrologic flood forecasting previously described this thesis research aims to integrate the physics-based hydrologic, hydraulic, and data-driven models under the paradigm of Multi-agent Systems for flood forecasting by designing and developing a multi-agent system (MAS) framework for flood forecasting events within the scope of tropical watersheds. With the emergence of the agent technologies, the "agent-based modeling" and "multiagent systems" simulation methods have provided applications for some areas of hydro base management like flood protection, planning, control, management, mitigation, and forecasting to combat the shocks produced by floods on society; however, all these focused on evacuation drills, and the latter not aimed at the tropical river basin, whose hydrological regime is extremely unique. In this catchment modeling environment approach, it was applied the multi-agent systems approach as a surrogate of the conventional hydrologic model to build a system that operates at the catchment level displayed with hydrometric stations, that use the data from hydrometric sensors networks (e.g., rainfall, river stage, river flow) captured, stored and administered by an organization of interacting agents whose main aim is to perform flow forecasting and awareness, and in so doing enhance the policy-making process at the watershed level. Section one of this document surveys the status of the current research in hydrologic modeling for the flood forecasting task. It is a journey through the background of related concerns to the hydrological process, flood ontologies, management, and forecasting. The section covers, to a certain extent, the techniques, methods, and theoretical aspects and methods of hydrological modeling and their types, from the conventional models to the present-day artificial intelligence prototypes, making special emphasis on the multi-agent systems, as most recent modeling methodology in the hydrological sciences. However, it is also underlined here that the section does not contribute to an all-inclusive revision, rather its purpose is to serve as a framework for this sort of work and a path to underline the significant aspects of the works. In section two of the document, it is detailed the conceptual framework for the suggested Multiagent system in support of flood forecasting. To accomplish this task, several works need to be carried out such as the sketching and implementation of the system’s framework with the (Belief-Desire-Intention model) architecture for flood forecasting events within the concept of the tropical river basin. Contributions of this proposed architecture are the replacement of the conventional hydrologic modeling with the use of multi-agent systems, which makes it quick for hydrometric time-series data administration and modeling of the precipitation-runoff process which conveys to flood in a river course. Another advantage is the user-friendly environment provided by the proposed multi-agent system platform graphical interface, the real-time generation of graphs, charts, and monitors with the information on the immediate event taking place in the catchment, which makes it easy for the viewer with some or no background in data analysis and their interpretation to get a visual idea of the information at hand regarding the flood awareness. The required agents developed in this multi-agent system modeling framework for flood forecasting have been trained, tested, and validated under a series of experimental tasks, using the hydrometric series information of rainfall, river stage, and streamflow data collected by the hydrometric sensor agents from the hydrometric sensors.Como se sabe, los problemas relacionados con la generación de inundaciones, su control y manejo, han sido tratados con herramientas tradicionales de modelado hidrológico enfocados al estudio y análisis de la relación precipitación-escorrentía, proceso físico que es impulsado por el ciclo hidrológico y el régimen climático y este esta directamente proporcional a la generación de crecidas. Dentro de la disciplina hidrológica, clasifican estas herramientas de modelado tradicionales en tres grupos principales, siendo el primer grupo el de modelos empíricos (modelos de caja negra), modelos conceptuales (o agrupados, semi-agrupados o semi-distribuidos) dependiendo de la distribución espacial y, por último, los basados en la física, modelos de proceso (o "modelos de caja blanca", y/o distribuidos). En este sentido, clasifican las aplicaciones de predicción de caudal fluvial en la ingeniería de recursos hídricos en dos tipos con respecto a los valores y parámetros que requieren en: modelos de procesos basados en la física y la categoría de modelos impulsados por datos. Los modelos basados en la física proporcionan una descripción detallada de la dinámica relacionada con los aspectos físicos que ocurren internamente entre los diferentes sistemas de una cuenca hidrográfica determinada. Sin embargo, aparte de ser complejos de implementar, se basan completamente en algoritmos matemáticos, y la comprensión de estas interacciones requiere la abstracción de conceptos matemáticos y la conceptualización de los procesos físicos que se entrelazan entre estos sistemas. Además, los modelos impulsados por datos no requieren conocimiento de los procesos físicos que gobiernan, sino que se basan únicamente en ecuaciones empíricas que necesitan una gran cantidad de datos y requieren calibración de los datos en el sitio. Los dos modelos difieren significativamente debido a sus requisitos de datos y de cómo expresan los fenómenos físicos. La elaboración de modelos hidrológicos para el pronóstico de inundaciones ha dado grandes pasos, pero siguen sin resolverse algunos contratiempos importantes, dada la naturaleza estocástica de los fenómenos hidrológicos, es el desafío de implementar sistemas de pronóstico fáciles de usar, reutilizables, robustos y confiables, la cantidad de incertidumbre que deben afrontar al intentar resolver el problema de la predicción de inundaciones. Sin embargo, en las últimas décadas, con el entorno creciente y el desarrollo del campo de la inteligencia artificial (IA), algunos investigadores rara vez han intentado abordar la naturaleza estocástica de los eventos hidrológicos con la aplicación de algunas de estas técnicas. Dados los contratiempos en el pronóstico de inundaciones hidrológicas descritos anteriormente, esta investigación de tesis tiene como objetivo integrar los modelos hidrológicos, basados en la física, hidráulicos e impulsados por datos bajo el paradigma de Sistemas de múltiples agentes para el pronóstico de inundaciones por medio del bosquejo y desarrollo del marco de trabajo del sistema multi-agente (MAS) para los eventos de predicción de inundaciones en el contexto de cuenca hidrográfica tropical. Con la aparición de las tecnologías de agentes, se han emprendido algunos enfoques de simulación recientes en la investigación hidrológica con modelos basados en agentes y sistema multi-agente, principalmente en alerta por inundaciones, seguridad y planificación de inundaciones, control y gestión de inundaciones y pronóstico de inundaciones, todos estos enfocado a simulacros de evacuación, y este último no dirigido a la cuenca tropical, cuyo régimen hidrológico es extremadamente único. En este enfoque de entorno de modelado de cuencas, se aplican los enfoques de sistemas multi-agente como un sustituto del modelado hidrológico convencional para construir un sistema que opera a nivel de cuenca con estaciones hidrométricas desplegadas, que utilizan los datos de redes de sensores hidrométricos (por ejemplo, lluvia , nivel del río, caudal del río) capturado, almacenado y administrado por una organización de agentes interactuantes cuyo objetivo principal es realizar pronósticos de caudal y concientización para mejorar las capacidades de soporte en la formulación de políticas a nivel de cuenca hidrográfica. La primera sección de este documento analiza el estado del arte sobre la investigación actual en modelos hidrológicos para la tarea de pronóstico de inundaciones. Es un viaje a través de los antecedentes preocupantes relacionadas con el proceso hidrológico, las ontologías de inundaciones, la gestión y la predicción. El apartado abarca, en cierta medida, las técnicas, métodos y aspectos teóricos y métodos del modelado hidrológico y sus tipologías, desde los modelos convencionales hasta los prototipos de inteligencia artificial actuales, haciendo hincapié en los sistemas multi-agente, como un enfoque de simulación reciente en la investigación hidrológica. Sin embargo, se destaca que esta sección no contribuye a una revisión integral, sino que su propósito es servir de marco para este tipo de trabajos y una guía para subrayar los aspectos significativos de los trabajos. En la sección dos del documento, se detalla el marco de trabajo propuesto para el sistema multi-agente para el pronóstico de inundaciones. Los trabajos realizados comprendieron el diseño y desarrollo del marco de trabajo del sistema multi-agente con la arquitectura (modelo Creencia-Deseo-Intención) para la predicción de eventos de crecidas dentro del concepto de cuenca hidrográfica tropical. Las contribuciones de esta arquitectura propuesta son el reemplazo del modelado hidrológico convencional con el uso de sistemas multi-agente, lo que agiliza la administración de las series de tiempo de datos hidrométricos y el modelado del proceso de precipitación-escorrentía que conduce a la inundación en el curso de un río. Otra ventaja es el entorno amigable proporcionado por la interfaz gráfica de la plataforma del sistema multi-agente propuesto, la generación en tiempo real de gráficos, cuadros y monitores con la información sobre el evento inmediato que tiene lugar en la cuenca, lo que lo hace fácil para el espectador con algo o sin experiencia en análisis de datos y su interpretación para tener una idea visual de la información disponible con respecto a la cognición de las inundaciones. Los agentes necesarios desarrollados en este marco de modelado de sistemas multi-agente para el pronóstico de inundaciones han sido entrenados, probados y validados en una serie de tareas experimentales, utilizando la información de la serie hidrométrica de datos de lluvia, nivel del río y flujo del curso de agua recolectados por los agentes sensores hidrométricos de los sensores hidrométricos de campo.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: María Araceli Sanchis de Miguel.- Secretario: Juan Gómez Romero.- Vocal: Juan Carlos Corrale
    corecore