70 research outputs found

    Fuzzy ARTMAP Ensemble Based Decision Making and Application

    Get PDF
    Because the performance of single FAM is affected by the sequence of sample presentation for the offline mode of training, a fuzzy ARTMAP (FAM) ensemble approach based on the improved Bayesian belief method is supposed to improve the classification accuracy. The training samples are input into a committee of FAMs in different sequence, the output from these FAMs is combined, and the final decision is derived by the improved Bayesian belief method. The experiment results show that the proposed FAMs’ ensemble can classify the different category reliably and has a better classification performance compared with single FAM

    Fuzzy ARTMAP Ensemble Based Decision Making and Application

    Get PDF
    Because the performance of single FAM is affected by the sequence of sample presentation for the offline mode of training, a fuzzy ARTMAP (FAM) ensemble approach based on the improved Bayesian belief method is supposed to improve the classification accuracy. The training samples are input into a committee of FAMs in different sequence, the output from these FAMs is combined, and the final decision is derived by the improved Bayesian belief method. The experiment results show that the proposed FAMs' ensemble can classify the different category reliably and has a better classification performance compared with single FAM

    Multi-tier framework for the inferential measurement and data-driven modeling

    Get PDF
    A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic

    Linear feature selection and classification using PNN and SFAM neural networks for a nearly online diagnosis of bearing naturally progressing degradations.

    No full text
    International audienceIn this work, an effort is made to characterize seven bearing states depending on the energy entropy of Intrinsic Mode Functions (IMFs) resulted from the Empirical Modes Decomposition (EMD).Three run-to-failure bearing vibration signals representing different defects either degraded or different failing components (roller, inner race and outer race) with healthy state lead to seven bearing states under study. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are used for feature reduction. Then, six classification scenarios are processed via a Probabilistic Neural Network (PNN) and a Simplified Fuzzy Adaptive resonance theory Map (SFAM) neural network. In other words, the three extracted feature data bases (EMD, PCA and LDA features) are processed firstly with SFAM and secondly with a combination of PNN-SFAM. The computation of classification accuracy and scattering criterion for each scenario shows that the EMD-LDA-PNN-SFAM combination is the suitable strategy for online bearing fault diagnosis. The proposed methodology reveals better generalization capability compared to previous works and it’s validated by an online bearing fault diagnosis. The proposed strategy can be applied for the decision making of several assets

    Data mining using intelligent systems : an optimized weighted fuzzy decision tree approach

    Get PDF
    Data mining can be said to have the aim to analyze the observational datasets to find relationships and to present the data in ways that are both understandable and useful. In this thesis, some existing intelligent systems techniques such as Self-Organizing Map, Fuzzy C-means and decision tree are used to analyze several datasets. The techniques are used to provide flexible information processing capability for handling real-life situations. This thesis is concerned with the design, implementation, testing and application of these techniques to those datasets. The thesis also introduces a hybrid intelligent systems technique: Optimized Weighted Fuzzy Decision Tree (OWFDT) with the aim of improving Fuzzy Decision Trees (FDT) and solving practical problems. This thesis first proposes an optimized weighted fuzzy decision tree, incorporating the introduction of Fuzzy C-Means to fuzzify the input instances but keeping the expected labels crisp. This leads to a different output layer activation function and weight connection in the neural network (NN) structure obtained by mapping the FDT to the NN. A momentum term was also introduced into the learning process to train the weight connections to avoid oscillation or divergence. A new reasoning mechanism has been also proposed to combine the constructed tree with those weights which had been optimized in the learning process. This thesis also makes a comparison between the OWFDT and two benchmark algorithms, Fuzzy ID3 and weighted FDT. SIx datasets ranging from material science to medical and civil engineering were introduced as case study applications. These datasets involve classification of composite material failure mechanism, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) signals, eye bacteria prediction and wave overtopping prediction. Different intelligent systems techniques were used to cluster the patterns and predict the classes although OWFDT was used to design classifiers for all the datasets. In the material dataset, Self-Organizing Map and Fuzzy C-Means were used to cluster the acoustic event signals and classify those events to different failure mechanism, after the classification, OWFDT was introduced to design a classifier in an attempt to classify acoustic event signals. For the eye bacteria dataset, we use the bagging technique to improve the classification accuracy of Multilayer Perceptrons and Decision Trees. Bootstrap aggregating (bagging) to Decision Tree also helped to select those most important sensors (features) so that the dimension of the data could be reduced. Those features which were most important were used to grow the OWFDT and the curse of dimensionality problem could be solved using this approach. The last dataset, which is concerned with wave overtopping, was used to benchmark OWFDT with some other Intelligent Systems techniques, such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Genetic Neural Mathematical Method (GNMM) and Fuzzy ARTMAP. Through analyzing these datasets using these Intelligent Systems Techniques, it has been shown that patterns and classes can be found or can be classified through combining those techniques together. OWFDT has also demonstrated its efficiency and effectiveness as compared with a conventional fuzzy Decision Tree and weighted fuzzy Decision Tree

    Computational intelligence techniques for missing data imputation

    Get PDF
    Despite considerable advances in missing data imputation techniques over the last three decades, the problem of missing data remains largely unsolved. Many techniques have emerged in the literature as candidate solutions, including the Expectation Maximisation (EM), and the combination of autoassociative neural networks and genetic algorithms (NN-GA). The merits of both these techniques have been discussed at length in the literature, but have never been compared to each other. This thesis contributes to knowledge by firstly, conducting a comparative study of these two techniques.. The significance of the difference in performance of the methods is presented. Secondly, predictive analysis methods suitable for the missing data problem are presented. The predictive analysis in this problem is aimed at determining if data in question are predictable and hence, to help in choosing the estimation techniques accordingly. Thirdly, a novel treatment of missing data for online condition monitoring problems is presented. An ensemble of three autoencoders together with hybrid Genetic Algorithms (GA) and fast simulated annealing was used to approximate missing data. Several significant insights were deduced from the simulation results. It was deduced that for the problem of missing data using computational intelligence approaches, the choice of optimisation methods plays a significant role in prediction. Although, it was observed that hybrid GA and Fast Simulated Annealing (FSA) can converge to the same search space and to almost the same values they differ significantly in duration. This unique contribution has demonstrated that a particular interest has to be paid to the choice of optimisation techniques and their decision boundaries. iii Another unique contribution of this work was not only to demonstrate that a dynamic programming is applicable in the problem of missing data, but to also show that it is efficient in addressing the problem of missing data. An NN-GA model was built to impute missing data, using the principle of dynamic programing. This approach makes it possible to modularise the problem of missing data, for maximum efficiency. With the advancements in parallel computing, various modules of the problem could be solved by different processors, working together in parallel. Furthermore, a method for imputing missing data in non-stationary time series data that learns incrementally even when there is a concept drift is proposed. This method works by measuring the heteroskedasticity to detect concept drift and explores an online learning technique. New direction for research, where missing data can be estimated for nonstationary applications are opened by the introduction of this novel method. Thus, this thesis has uniquely opened the doors of research to this area. Many other methods need to be developed so that they can be compared to the unique existing approach proposed in this thesis. Another novel technique for dealing with missing data for on-line condition monitoring problem was also presented and studied. The problem of classifying in the presence of missing data was addressed, where no attempts are made to recover the missing values. The problem domain was then extended to regression. The proposed technique performs better than the NN-GA approach, both in accuracy and time efficiency during testing. The advantage of the proposed technique is that it eliminates the need for finding the best estimate of the data, and hence, saves time. Lastly, instead of using complicated techniques to estimate missing values, an imputation approach based on rough sets is explored. Empirical results obtained using both real and synthetic data are given and they provide a valuable and promising insight to the problem of missing data. The work, has significantly confirmed that rough sets can be reliable for missing data estimation in larger and real databases

    Combining semantic web technologies with evolving fuzzy classifier eClass for EHR-based phenotyping : a feasibility study

    Get PDF
    In parallel to nation-wide efforts for setting up shared electronic health records (EHRs) across healthcare settings, several large-scale national and international projects are developing, validating, and deploying electronic EHR oriented phenotype algorithms that aim at large-scale use of EHRs data for genomic studies. A current bottleneck in using EHRs data for obtaining computable phenotypes is to transform the raw EHR data into clinically relevant features. The research study presented here proposes a novel combination of Semantic Web technologies with the on-line evolving fuzzy classifier eClass to obtain and validate EHR-driven computable phenotypes derived from 1956 clinical statements from EHRs. The evaluation performed with clinicians demonstrates the feasibility and practical acceptability of the approach proposed
    corecore