1,185 research outputs found

    Clustering ensemble method

    Get PDF
    A clustering ensemble aims to combine multiple clustering models to produce a better result than that of the individual clustering algorithms in terms of consistency and quality. In this paper, we propose a clustering ensemble algorithm with a novel consensus function named Adaptive Clustering Ensemble. It employs two similarity measures, cluster similarity and a newly defined membership similarity, and works adaptively through three stages. The first stage is to transform the initial clusters into a binary representation, and the second is to aggregate the initial clusters that are most similar based on the cluster similarity measure between clusters. This iterates itself adaptively until the intended candidate clusters are produced. The third stage is to further refine the clusters by dealing with uncertain objects to produce an improved final clustering result with the desired number of clusters. Our proposed method is tested on various real-world benchmark datasets and its performance is compared with other state-of-the-art clustering ensemble methods, including the Co-association method and the Meta-Clustering Algorithm. The experimental results indicate that on average our method is more accurate and more efficient

    Multi-tier framework for the inferential measurement and data-driven modeling

    Get PDF
    A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic

    Metaheuristic design of feedforward neural networks: a review of two decades of research

    Get PDF
    Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era

    Evolving machine learning and deep learning models using evolutionary algorithms

    Get PDF
    Despite the great success in data mining, machine learning and deep learning models are yet subject to material obstacles when tackling real-life challenges, such as feature selection, initialization sensitivity, as well as hyperparameter optimization. The prevalence of these obstacles has severely constrained conventional machine learning and deep learning methods from fulfilling their potentials. In this research, three evolving machine learning and one evolving deep learning models are proposed to eliminate above bottlenecks, i.e. improving model initialization, enhancing feature representation, as well as optimizing model configuration, respectively, through hybridization between the advanced evolutionary algorithms and the conventional ML and DL methods. Specifically, two Firefly Algorithm based evolutionary clustering models are proposed to optimize cluster centroids in K-means and overcome initialization sensitivity as well as local stagnation. Secondly, a Particle Swarm Optimization based evolving feature selection model is developed for automatic identification of the most effective feature subset and reduction of feature dimensionality for tackling classification problems. Lastly, a Grey Wolf Optimizer based evolving Convolutional Neural Network-Long Short-Term Memory method is devised for automatic generation of the optimal topological and learning configurations for Convolutional Neural Network-Long Short-Term Memory networks to undertake multivariate time series prediction problems. Moreover, a variety of tailored search strategies are proposed to eliminate the intrinsic limitations embedded in the search mechanisms of the three employed evolutionary algorithms, i.e. the dictation of the global best signal in Particle Swarm Optimization, the constraint of the diagonal movement in Firefly Algorithm, as well as the acute contraction of search territory in Grey Wolf Optimizer, respectively. The remedy strategies include the diversification of guiding signals, the adaptive nonlinear search parameters, the hybrid position updating mechanisms, as well as the enhancement of population leaders. As such, the enhanced Particle Swarm Optimization, Firefly Algorithm, and Grey Wolf Optimizer variants are more likely to attain global optimality on complex search landscapes embedded in data mining problems, owing to the elevated search diversity as well as the achievement of advanced trade-offs between exploration and exploitation

    A Survey on Particle Swarm Optimization for Association Rule Mining

    Get PDF
    Association rule mining (ARM) is one of the core techniques of data mining to discover potentially valuable association relationships from mixed datasets. In the current research, various heuristic algorithms have been introduced into ARM to address the high computation time of traditional ARM. Although a more detailed review of the heuristic algorithms based on ARM is available, this paper differs from the existing reviews in that we expected it to provide a more comprehensive and multi-faceted survey of emerging research, which could provide a reference for researchers in the field to help them understand the state-of-the-art PSO-based ARM algorithms. In this paper, we review the existing research results. Heuristic algorithms for ARM were divided into three main groups, including biologically inspired, physically inspired, and other algorithms. Additionally, different types of ARM and their evaluation metrics are described in this paper, and the current status of the improvement in PSO algorithms is discussed in stages, including swarm initialization, algorithm parameter optimization, optimal particle update, and velocity and position updates. Furthermore, we discuss the applications of PSO-based ARM algorithms and propose further research directions by exploring the existing problems.publishedVersio

    A Multi-Agent Architecture for the Design of Hierarchical Interval Type-2 Beta Fuzzy System

    Get PDF
    This paper presents a new methodology for building and evolving hierarchical fuzzy systems. For the system design, a tree-based encoding method is adopted to hierarchically link low dimensional fuzzy systems. Such tree structural representation has by nature a flexible design offering more adjustable and modifiable structures. The proposed hierarchical structure employs a type-2 beta fuzzy system to cope with the faced uncertainties, and the resulting system is called the Hierarchical Interval Type-2 Beta Fuzzy System (HT2BFS). For the system optimization, two main tasks of structure learning and parameter tuning are applied. The structure learning phase aims to evolve and learn the structures of a population of HT2BFS in a multiobjective context taking into account the optimization of both the accuracy and the interpretability metrics. The parameter tuning phase is applied to refine and adjust the parameters of the system. To accomplish these two tasks in the most optimal and faster way, we further employ a multi-agent architecture to provide both a distributed and a cooperative management of the optimization tasks. Agents are divided into two different types based on their functions: a structure agent and a parameter agent. The main function of the structure agent is to perform a multi-objective evolutionary structure learning step by means of the Multi-Objective Immune Programming algorithm (MOIP). The parameter agents have the function of managing different hierarchical structures simultaneously to refine their parameters by means of the Hybrid Harmony Search algorithm (HHS). In this architecture, agents use cooperation and communication concepts to create high-performance HT2BFSs. The performance of the proposed system is evaluated by several comparisons with various state of art approaches on noise-free and noisy time series prediction data sets and regression problems. The results clearly demonstrate a great improvement in the accuracy rate, the convergence speed and the number of used rules as compared with other existing approaches

    Development of a sustainable groundwater management strategy and sequential compliance monitoring to control saltwater intrusion in coastal aquifers

    Get PDF
    The coastal areas of the world are characterized by high population densities, an abundance of food, and increased economic activities. These increasing human settlements, subsequent increases in agricultural developments and economic activities demand an increasing amount quantity of freshwater supplies to different sectors. Groundwater in coastal aquifers is one of the most important sources of freshwater supplies. Over exploitation of this coastal groundwater resource results in seawater intrusion and subsequent deterioration of groundwater quality in coastal aquifers. In addition, climate change induced sea level rise, in combination with the effect of excessive groundwater extraction, can accelerate the seawater intrusion. Adequate supply of good quality water to different sectors in coastal areas can be ensured by adoption of a proper management strategy for groundwater extraction. Optimal use of the coastal groundwater resource is one of the best management options, which can be achieved by employing a properly developed optimal groundwater extraction strategy. Coupled simulation-optimization (S-O) approaches are essential tools to obtain the optimal groundwater extraction patterns. This study proposes approaches for developing multiple objective management of coastal aquifers with the aid of barrier extraction wells as hydraulic control measure of saltwater intrusion in multilayered coastal aquifer systems. Therefore, two conflicting objectives of management policy are considered in this research, i.e. maximizing total groundwater extraction for advantageous purposes, and minimizing the total amount of water abstraction from barrier extraction wells. The study also proposes an adaptive management strategy for coastal aquifers by developing a three-dimensional (3-D) monitoring network design. The performance of the proposed methodologies is evaluated by using both an illustrative multilayered coastal aquifer system and a real life coastal aquifer study area. Coupled S-O approach is used as the basic tool to develop a saltwater intrusion management model to obtain the optimal groundwater extraction rates from a combination of feasible solutions on the Pareto optimal front. Simulation of saltwater intrusion processes requires solution of density dependent coupled flow and solute transport numerical simulation models that are computationally intensive. Therefore, computational efficiency in the coupled S-O approach is achieved by using an approximate emulator of the accompanying physical processes of coastal aquifers. These emulators, often known as surrogate models or meta-models, can replace the computationally intensive numerical simulation model in a coupled S-O approach for achieving computational efficiency. A number of meta-models have been developed and compared in this study for integration with the optimization algorithm in order to develop saltwater intrusion management model. Fuzzy Inference System (FIS), Adaptive Neuro Fuzzy Inference System (ANFIS), Multivariate Adaptive Regression Spline (MARS), and Gaussian Process Regression (GPR) based meta-models are developed in the present study for approximating coastal aquifer responses to groundwater extraction. Properly trained and tested meta-models are integrated with a Controlled Elitist Multiple Objective Genetic Algorithm (CEMOGA) within a coupled S-O approach. In each iteration of the optimization algorithm, the meta-models are used to compute the corresponding salinity concentrations for a set of candidate pumping patterns generated by the optimization algorithm. Upon convergence, the non-dominated global optimal solutions are obtained as the Pareto optimal front, which represents a trade-off between the two conflicting objectives of the pumping management problem. It is observed from the solutions of the meta-model based coupled S-O approach that the considered meta-models are capable of producing a Pareto optimal set of solutions quite accurately. However, each meta-modelling approach has distinct advantages over the others when utilized within the integrated S-O approach. Uncertainties in estimating complex flow and solute transport processes in coastal aquifers demand incorporation of the uncertainties related to some of the model parameters. Multidimensional heterogeneity of aquifer properties such as hydraulic conductivity, compressibility, and bulk density are considered as major sources of uncertainty in groundwater modelling system. Other sources of uncertainty are associated with spatial and temporal variability of hydrologic as well as human interventions, e.g. aquifer recharge and transient groundwater extraction patterns. Different realizations of these uncertain model parameters are obtained from different statistical distributions. FIS based meta-models are advanced to a Genetic Algorithm (GA) tuned hybrid FIS model (GA-FIS), to emulate physical processes of coastal aquifers and to evaluate responses of the coastal aquifers to groundwater extraction under groundwater parameter uncertainty. GA is used to tune the FIS parameters in order to obtain the optimal FIS structure. The GA-FIS models thus obtained are linked externally to the CEMOGA in order to derive an optimal pumping management strategy using the coupled S-O approach. The evaluation results show that the proposed saltwater intrusion management model is able to derive reliable optimal groundwater extraction strategies to control saltwater intrusion for the illustrative multilayered coastal aquifer system. The optimal management strategies obtained as solutions of GA-FIS based management models are shown to be reliable and accurate within the specified ranges of values for different realizations of uncertain groundwater parameters. One of the major concerns of the meta-model based integrated S-O approach is the uncertainty associated with the meta-model predictions. These prediction uncertainties, if not addressed properly, may propagate to the optimization procedures, and may deteriorate the optimality of the solutions. A standalone meta-model, when used within an optimal management model, may result in the optimization routine producing actually suboptimal solutions that may undermine the optimality of the groundwater extraction strategies. Therefore, this study proposes an ensemble approach to address the prediction uncertainties of meta-models. Ensemble is an approach to assimilate multiple similar or different algorithms or base learners (emulators). The basic idea of ensemble lies in developing a more reliable and robust prediction tool that incorporates each individual emulator's unique characteristic in order to predict future scenarios. Each individual member of the ensemble contains different input -output mapping functions. Based on their own mapping functions, these individual emulators provide varied predictions on the response variable. Therefore, the combined prediction of the ensemble is likely to be less biased and more robust, reliable, and accurate than that of any of the individual members of the ensemble. Performance of the ensemble meta-models is evaluated using an illustrative coastal aquifer study area. The results indicate that the meta-model based ensemble modelling approach is able to provide reliable solutions for a multilayered coastal aquifer management problem. Relative sea level rise, providing an additional saline water head at the seaside, has a significant impact on an increase in the salinization process of the coastal aquifers. Although excessive groundwater withdrawal is considered as the major cause of saltwater intrusion, relative sea level rise, in combination with the effect of excessive groundwater pumping, can exacerbate the already vulnerable coastal aquifers. This study incorporates the effects of relative sea level rise on the optimized groundwater extraction values for the specified management period. Variation of water concentrations in the tidal river and seasonal fluctuation of river water stage are also incorporated. Three meta-models are developed from the solution results of the numerical simulation model that simulates the coupled flow and solute transport processes in a coastal aquifer system. The results reveal that the proposed meta-models are capable of predicting density dependent coupled flow and solute transport patterns quite accurately. Based on the comparison results, the best meta-model is selected as a computationally cheap substitute of the simulation model in the coupled S-O based saltwater intrusion management model. The performance of the proposed methodology is evaluated for an illustrative multilayered coastal aquifer system in which the effect of climate change induced sea level rise is incorporated for the specified management period. The results show that the proposed saltwater intrusion management model provides acceptable, accurate, and reliable solutions while significantly improving computational efficiency in the coupled S-O methodology. The success of the developed management strategy largely depends on how accurately the prescribed management policy is implemented in real life situations. The actual implementation of a prescribed management strategy often differs from the prescribed planned strategy due to various uncertainties in predicting the consequences, as well as practical constraints, including noncompliance with the prescribed strategy. This results in actual consequences of a management strategy differing from the intended results. To bring the management consequences closer to the intended results, adaptive management strategies can be sequentially modified at different stages of the management horizon using feedback measurements from a deigned monitoring network. This feedback information can be the actual spatial and temporal concentrations resulting from the implementation of actual management strategy. Therefore, field-scale compliance of the developed coastal aquifer management strategy is a crucial aspect of an optimally designed groundwater extraction policy. A 3-D compliance monitoring network design methodology is proposed in this study in order to develop an adaptive and sequentially modified management policy, which aims to improve optimal and justifiable use of groundwater resources in coastal aquifers. In the first step, an ensemble meta-model based multiple objective prescriptive model is developed using a coupled S-O approach in order to derive a set of Pareto optimal groundwater extraction strategies. Prediction uncertainty of meta-models is addressed by utilizing a weighted average ensemble using Set Pair Analysis. In the second step, a monitoring network is designed for evaluating the compliance of the implemented strategies with the prescribed management goals due to possible uncertainties associated with field-scale application of the proposed management policy. Optimal monitoring locations are obtained by maximizing Shannon's entropy between the saltwater concentrations at the selected potential locations. Performance of the proposed 3-D sequential compliance monitoring network design is assessed for an illustrative multilayered coastal aquifer study area. The performance evaluations show that sequential improvements of optimal management strategy are possible by utilizing saltwater concentrations measurements at the proposed optimal compliance monitoring locations. The integrated S-O approach is used to develop a saltwater intrusion management model for a real world coastal aquifer system in the Barguna district of southern Bangladesh. The aquifer processes are simulated by using a 3-D finite element based combined flow and solute transport numerical code. The modelling and management of seawater intrusion processes are performed based on very limited hydrogeological data. The model is calibrated with respect to hydraulic heads for a period of five years from April 2010 to April 2014. The calibrated model is validated for the next three-year period from April 2015 to April 2017. The calibrated and partially validated model is then used within the integrated S-O approach to develop optimal groundwater abstraction patterns to control saltwater intrusion in the study area. Computational efficiency of the management model is achieved by using a MARS based meta-model approximately emulating the combined flow and solute transport processes of the study area. This limited evaluation demonstrates that a planned transient groundwater abstraction strategy, acquired as solution results of a meta-model based integrated S-O approach, is a useful management strategy for optimized water abstraction and saltwater intrusion control. This study shows the capability of the MARS meta-model based integrated S-O approach to solve real-life complex management problems in an efficient manner

    Reducing the number of membership functions in linguistic variables

    Get PDF
    Dissertation presented at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia in fulfilment of the requirements for the Masters degree in Mathematics and Applications, specialization in Actuarial Sciences, Statistics and Operations ResearchThe purpose of this thesis was to develop algorithms to reduce the number of membership functions in a fuzzy linguistic variable. Groups of similar membership functions to be merged were found using clustering algorithms. By “summarizing” the information given by a similar group of membership functions into a new membership function we obtain a smaller set of membership functions representing the same concept as the initial linguistic variable. The complexity of clustering problems makes it difficult for exact methods to solve them in practical time. Heuristic methods were therefore used to find good quality solutions. A Scatter Search clustering algorithm was implemented in Matlab and compared to a variation of the K-Means algorithm. Computational results on two data sets are discussed. A case study with linguistic variables belonging to a fuzzy inference system automatically constructed from data collected by sensors while drilling in different scenarios is also studied. With these systems already constructed, the task was to reduce the number of membership functions in its linguistic variables without losing performance. A hierarchical clustering algorithm relying on performance measures for the inference system was implemented in Matlab. It was possible not only to simplify the inference system by reducing the number of membership functions in each linguistic variable but also to improve its performance

    Dynamic protein classification: Adaptive models based on incremental learning strategies

    Get PDF
    Abstract One of the major problems in computational biology is the inability of existing classification models to incorporate expanding and new domain knowledge. This problem of static classification models is addressed in this thesis by the introduction of incremental learning for problems in bioinformatics. The tools which have been developed are applied to the problem of classifying proteins into a number of primary and putative families. The importance of this type of classification is of particular relevance due to its role in drug discovery programs and the benefit it lends to this process in terms of cost and time saving. As a secondary problem, multi–class classification is also addressed. The standard approach to protein family classification is based on the creation of committees of binary classifiers. This one-vs-all approach is not ideal, and the classification systems presented here consists of classifiers that are able to do all-vs-all classification. Two incremental learning techniques are presented. The first is a novel algorithm based on the fuzzy ARTMAP classifier and an evolutionary strategy. The second technique applies the incremental learning algorithm Learn++. The two systems are tested using three datasets: data from the Structural Classification of Proteins (SCOP) database, G-Protein Coupled Receptors (GPCR) database and Enzymes from the Protein Data Bank. The results show that both techniques are comparable with each other, giving classification abilities which are comparable to that of the single batch trained classifiers, with the added ability of incremental learning. Both the techniques are shown to be useful to the problem of protein family classification, but these techniques are applicable to problems outside this area, with applications in proteomics including the predictions of functions, secondary and tertiary structures, and applications in genomics such as promoter and splice site predictions and classification of gene microarrays
    corecore