5 research outputs found

    Heterogeneous neural networks: theory and applications

    Get PDF
    Aquest treball presenta una classe de funcions que serveixen de models neuronals generalitzats per ser usats en xarxes neuronals artificials. Es defineixen com una mesura de similitud que actúa com una definició flexible de neurona vista com un reconeixedor de patrons. La similitud proporciona una marc conceptual i serveix de cobertura unificadora de molts models neuronals de la literatura i d'exploració de noves instàncies de models de neurona. La visió basada en similitud porta amb naturalitat a integrar informació heterogènia, com ara quantitats contínues i discretes (nominals i ordinals), i difuses ó imprecises. Els valors perduts es tracten de manera explícita. Una neurona d'aquesta classe s'anomena neurona heterogènia i qualsevol arquitectura neuronal que en faci ús serà una Xarxa Neuronal Heterogènia.En aquest treball ens concentrem en xarxes neuronals endavant, com focus inicial d'estudi. Els algorismes d'aprenentatge són basats en algorisms evolutius, especialment extesos per treballar amb informació heterogènia. En aquesta tesi es descriu com una certa classe de neurones heterogènies porten a xarxes neuronals que mostren un rendiment molt satisfactori, comparable o superior al de xarxes neuronals tradicionals (com el perceptró multicapa ó la xarxa de base radial), molt especialment en presència d'informació heterogènia, usual en les bases de dades actuals.This work presents a class of functions serving as generalized neuron models to be used in artificial neural networks. They are cast into the common framework of computing a similarity function, a flexible definition of a neuron as a pattern recognizer. The similarity endows the model with a clear conceptual view and serves as a unification cover for many of the existing neural models, including those classically used for the MultiLayer Perceptron (MLP) and most of those used in Radial Basis Function Networks (RBF). These families of models are conceptually unified and their relation is clarified. The possibilities of deriving new instances are explored and several neuron models --representative of their families-- are proposed. The similarity view naturally leads to further extensions of the models to handle heterogeneous information, that is to say, information coming from sources radically different in character, including continuous and discrete (ordinal) numerical quantities, nominal (categorical) quantities, and fuzzy quantities. Missing data are also explicitly considered. A neuron of this class is called an heterogeneous neuron and any neural structure making use of them is an Heterogeneous Neural Network (HNN), regardless of the specific architecture or learning algorithm. Among them, in this work we concentrate on feed-forward networks, as the initial focus of study. The learning procedures may include a great variety of techniques, basically divided in derivative-based methods (such as the conjugate gradient)and evolutionary ones (such as variants of genetic algorithms).In this Thesis we also explore a number of directions towards the construction of better neuron models --within an integrant envelope-- more adapted to the problems they are meant to solve.It is described how a certain generic class of heterogeneous models leads to a satisfactory performance, comparable, and often better, to that of classical neural models, especially in the presence of heterogeneous information, imprecise or incomplete data, in a wide range of domains, most of them corresponding to real-world problems.Postprint (published version

    Dynamics analysis and applications of neural networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Using function approximation to analyze the sensitivity of MLP with antisymmetric squashing activation function

    No full text

    Application of Deep Learning in Chemical Processes: Explainability, Monitoring and Observability

    Get PDF
    The last decade has seen remarkable advances in speech, image, and language recognition tools that have been made available to the public through computer and mobile devices’ applications. Most of these significant improvements were achieved by Artificial Intelligence (AI)/ deep learning (DL) algorithms (Hinton et al., 2006) that generally refers to a set of novel neural network architectures and algorithms such as long-short term memory (LSTM) units, convolutional networks (CNN), autoencoders (AE), t-distributed stochastic embedding (TSNE), etc. Although neural networks are not new, due to a combination of relatively novel improvements in methods for training the networks and the availability of increasingly powerful computers, one can now model much more complex nonlinear dynamic behaviour by using complex structures of neurons, i.e. more layers of neurons, than ever before (Goodfellow et al., 2016). However, it is recognized that the training of neural nets of such complex structures requires a vast amount of data. In this sense manufacturing processes are good candidates for deep learning applications since they utilize computers and information systems for monitoring and control thus generating a massive amount of data. This is especially true in pharmaceutical companies such as Sanofi Pasteur, the industrial collaborator for the current study, where large data sets are routinely stored for monitoring and regulatory purposes. Although novel DL algorithms have been applied with great success in image analysis, speech recognition, and language translation, their applications to chemical processes and pharmaceutical processes, in particular, are scarce. The current work deals with the investigation of deep learning in process systems engineering for three main areas of application: (i) Developing a deep learning classification model for profit-based operating regions. (ii) Developing both supervised and unsupervised process monitoring algorithms. (iii) Observability Analysis It is recognized that most empirical or black-box models, including DL models, have good generalization capabilities but are difficult to interpret. For example, using these methods it is difficult to understand how a particular decision is made, which input variable/feature is greatly influencing the decision made by the DL models etc. This understanding is expected to shed light on why biased results can be obtained or why a wrong class is predicted with a higher probability in classification problems. Hence, a key goal of the current work is on deriving process insights from DL models. To this end, the work proposes both supervised and unsupervised learning approaches to identify regions of process inputs that result in corresponding regions, i.e. ranges of values, of process profit. Furthermore, it will be shown that the ability to better interpret the model by identifying inputs that are most informative can be used to reduce over-fitting. To this end, a neural network (NN) pruning algorithm is developed that provides important physical insights on the system regarding the inputs that have positive and negative effect on profit function and to detect significant changes in process phenomenon. It is shown that pruning of input variables significantly reduces the number of parameters to be estimated and improves the classification test accuracy for both case studies: the Tennessee Eastman Process (TEP) and an industrial vaccine manufacturing process. The ability to store a large amount of data has permitted the use of deep learning (DL) and optimization algorithms for the process industries. In order to meet high levels of product quality, efficiency, and reliability, a process monitoring system is needed. The two aspects of Statistical Process Control (SPC) are fault detection and diagnosis (FDD). Many multivariate statistical methods like PCA and PLS and their dynamic variants have been extensively used for FD. However, the inherent non-linearities in the process pose challenges while using these linear models. Numerous deep learning FDD approaches have also been developed in the literature. However, the contribution plots for identifying the root cause of the fault have not been derived from Deep Neural Networks (DNNs). To this end, the supervised fault detection problem in the current work is formulated as a binary classification problem while the supervised fault diagnosis problem is formulated as a multi-class classification problem to identify the type of fault. Then, the application of the concept of explainability of DNNs is explored with its particular application in FDD problem. The developed methodology is demonstrated on TEP with non-incipient faults. Incipient faults are faulty conditions where signal to noise ratio is small and have not been widely studied in the literature. To address the same, a hierarchical dynamic deep learning algorithm is developed specifically to address the issue of fault detection and diagnosis of incipient faults. One of the major drawbacks of both the methods described above is the availability of labeled data i.e. normal operation and faulty operation data. From an industrial point of view, most data in an industrial setting, especially for biochemical processes, is obtained during normal operation and faulty data may not be available or may be insufficient. Hence, we also develop an unsupervised DL approach for process monitoring. It involves a novel objective function and a NN architecture that is tailored to detect the faults effectively. The idea is to learn the distribution of normal operation data to differentiate among the fault conditions. In order to demonstrate the advantages of the proposed methodology for fault detection, systematic comparisons are conducted with Multiway Principal Component Analysis (MPCA) and Multiway Partial Least Squares (MPLS) on an industrial scale Penicillin Simulator. Past investigations reported that the variability in productivity in the Sanofi's Pertussis Vaccine Manufacturing process may be highly correlated to biological phenomena, i.e. oxidative stresses, that are not routinely monitored by the company. While the company monitors and stores a large amount of fermentation data it may not be sufficiently informative about the underlying phenomena affecting the level of productivity. Furthermore, since the addition of new sensors in pharmaceutical processes requires extensive and expensive validation and certification procedures, it is very important to assess the potential ability of a sensor to observe relevant phenomena before its actual adoption in the manufacturing environment. This motivates the study of the observability of the phenomena from available data. An algorithm is proposed to check the observability for the classification task from the observed data (measurements). The proposed methodology makes use of a Supervised AE to reduce the dimensionality of the inputs. Thereafter, a criterion on the distance between the samples is used to calculate the percentage of overlap between the defined classes. The proposed algorithm is tested on the benchmark Tennessee Eastman process and then applied to the industrial vaccine manufacturing process
    corecore