89 research outputs found

    Latent variable modeling approaches to assist the implementation of quality-by-design paradigms in pharmaceutical development and manufacturing

    Get PDF
    With the introduction of the Quality-by-Design (QbD) initiative, the American Food and Drug Administration and the other pharmaceutical regulatory Agencies aimed to change the traditional approaches to pharmaceutical development and manufacturing. Pharmaceutical companies have been encouraged to use systematic and science-based tools for the design and control of their processes, in order to demonstrate a full understanding of the driving forces acting on them. From an engineering perspective, this initiative can be seen as the need to apply modeling tools in pharmaceutical development and manufacturing activities. The aim of this Dissertation is to show how statistical modeling, and in particular latent variable models (LVMs), can be used to assist the practical implementation of QbD paradigms to streamline and accelerate product and process design activities in pharmaceutical industries, and to provide a better understanding and control of pharmaceutical manufacturing processes. Three main research areas are explored, wherein LVMs can be applied to support the practical implementation of the QbD paradigms: process understanding, product and process design, and process monitoring and control. General methodologies are proposed to guide the use of LVMs in different applications, and their effectiveness is demonstrated by applying them to industrial, laboratory and simulated case studies. With respect to process understanding, a general methodology for the use of LVMs is proposed to aid the development of continuous manufacturing systems. The methodology is tested on an industrial process for the continuous manufacturing of tablets. It is shown how LVMs can model jointly data referred to different raw materials and different units in the production line, allowing to understand which are the most important driving forces in each unit and which are the most critical units in the line. Results demonstrate how raw materials and process parameters impact on the intermediate and final product quality, enabling to identify paths along which the process moves depending on its settings. This provides a tool to assist quality risk assessment activities and to develop the control strategy for the process. In the area of product and process design, a general framework is proposed for the use of LVM inversion to support the development of new products and processes. The objective of model inversion is to estimate the best set of inputs (e.g., raw material properties, process parameters) that ensure a desired set of outputs (e.g., product quality attributes). Since the inversion of an LVM may have infinite solutions, generating the so-called null space, an optimization framework allowing to assign the most suitable objectives and constraints is used to select the optimal solution. The effectiveness of the framework is demonstrated in an industrial particle engineering problem to design the raw material properties that are needed to produce granules with desired characteristics from a high-shear wet granulation process. Results show how the framework can be used to design experiments for new products design. The analogy between the null space and the Agencies’ definition of design space is also demonstrated and a strategy to estimate the uncertainties in the design and in the null space determination is provided. The proposed framework for LVM inversion is also applied to assist the design of the formulation for a new product, namely the selection of the best excipient type and amount to mix with a given active pharmaceutical ingredient (API) to obtain a blend of desired properties. The optimization framework is extended to include constraints on the material selection, the API dose or the final tablet weight. A user-friendly interface is developed to aid formulators in providing the constraints and objectives of the problem. Experiments performed industrially on the formulation designed in-silico confirm that model predictions are in good agreement with the experimental values. LVM inversion is shown to be useful also to address product transfer problems, namely the problem of transferring the manufacturing of a product from a source plant, wherein most of the experimentation has been carried out, to a target plant which may differ for size, lay-out or involved units. An experimental process for pharmaceutical nanoparticles production is used as a test bed. An LVM built on different plant data is inverted to estimate the most suitable process conditions in a target plant to produce nanoparticles of desired mean size. Experiments designed on the basis of the proposed LVM inversion procedure demonstrate that the desired nanoparticles sizes are obtained, within experimental uncertainty. Furthermore, the null space concept is validated experimentally. Finally, with respect to the process monitoring and control area, the problem of transferring monitoring models between different plants is studied. The objective is to monitor a process in a target plant where the production is being started (e.g., a production plant) by exploiting the data available from a source plant (e.g., a pilot plant). A general framework is proposed to use LVMs to solve this problem. Several scenarios are identified on the basis of the available information, of the source of data and on the type of variables to include in the model. Data from the different plants are related through subsets of variables (common variables) measured in both plants, or through plant-independent variables obtained from conservation balances (e.g., dimensionless numbers). The framework is applied to define the process monitoring model for an industrial large-scale spray-drying process, using data available from a pilot-scale process. The effectiveness of the transfer is evaluated in terms of monitoring performances in the detection of a real fault occurring in the target process. The proposed methodologies are then extended to batch systems, considering a simulated penicillin fermentation process. In both cases, results demonstrate that the transfer of knowledge from the source plant enables better monitoring performances than considering only the data available from the target plant

    Multivariate statistical analysis of Hall-Héroult reduction cells : investigation and monitoring of factors affecting performance

    Get PDF
    Les cuves d'électrolyse utilisées pour la production aluminium sont soumises à des variations de la qualité des matières premières, à des perturbations diverses encourues en cours de production ou en cours de démarrage. Il est connu que ces perturbations ont un impact sur la durée de vie des cuves ainsi que sur l'efficacité de production, métallurgique et énergétique. L'amélioration des performances passe nécessairement par une meilleure compréhension des sources de variations. Plusieurs travaux ont été présentés jusqu'à présent par le biais d'études univariées entre les différents facteurs et les performances. Cependant, dans ces études, le comportement des cuves n'est pas étudié de manière multivariée, ce qui ne permet pas d'étudier les interactions entre les différentes variables. Cette thèse propose d'étudier les facteurs affectant les performances des cuves d'électrolyse, précisément la duré de vie, le rendement Faraday et la consommation énergétique, par le biais de méthodes statistiques multivariées (PCA et PLS). Premièrement, il est démontré que la durée de vie des cuves est expliquée à 72% en utilisant l'information provenant des préchauffages, des démarrages et de l'opération transitoire, démontrant ainsi l'effet de ces étapes sur la durée de vie des cuves. Cette étude est suivie d'une analyse des facteurs affectant l'efficacité de courant et la consommation énergétique des cuves. L'effet de la qualité de l'alumine, des anodes, des variables manipulées, et des variables d'états des cuves permet d'expliquer 50% des variations des performances. Cette étude démontre l'importance du contrôle de la hauteur de bain. Ainsi, une étude approfondie des facteurs affectant la hauteur de bain est effectuée. La composition du produit de recouvrement des anodes a un impact majeur sur la hauteur de bain. Malheureusement, il est présentement impossible de bien effectuer le suivi et le contrôle de cette composition puisque seulement quelques échantillons sont analysés quotidiennement. Afin de palier à ce manque, cette thèse présente une nouvelle approche, basée sur l'analyse d'image, pour prédire la composition du produit de recouvrement. Cette application faciliterait le suivi et le contrôle de la composition, ce qui améliorerait le contrôle de la hauteur de bain permettant ainsi d'améliorer les performances des cuves

    N2O emissions and aeration efficiency in wastewater treatment : improved monitoring, mechanistic modelling and data mining

    Get PDF

    Effect of curing conditions and harvesting stage of maturity on Ethiopian onion bulb drying properties

    Get PDF
    The study was conducted to investigate the impact of curing conditions and harvesting stageson the drying quality of onion bulbs. The onion bulbs (Bombay Red cultivar) were harvested at three harvesting stages (early, optimum, and late maturity) and cured at three different temperatures (30, 40 and 50 oC) and relative humidity (30, 50 and 70%). The results revealed that curing temperature, RH, and maturity stage had significant effects on all measuredattributesexcept total soluble solids

    Manifold learning techniques and statistical approaches applied to the disruption prediction in tokamaks

    Get PDF
    The nuclear fusion arises as the unique clean energy source capable to meet the energy needs of the entire world in the future. On present days, several experimental fusion devices are operating to optimize the fusion process, confining the plasma by means of magnetic fields. The goal of plasma confined in a magnetic field can be achieved by linear cylindrical configurations or toroidal configurations, e.g., stellarator, reverse field pinch, or tokamak. Among the explored magnetic confinement techniques, the tokamak configuration is to date considered the most reliable. Unfortunately, the tokamak is vulnerable to instabilities that, in the most severe cases, can lead to lose the magnetic confinement; this phenomenon is called disruption. Disruptions are dangerous and irreversible events for the device during which the plasma energy is suddenly released on the first wall components and vacuum vessel causing runaway electrons, large mechanical forces and intense thermal loads, which may cause severe damage to the vessel wall and the plasma face components. Present devices are designed to resist the disruptive events; for this reason, today, the disruptions are generally tolerable. Furthermore, one of their aims is the investigation of disruptive boundaries in the operational space. However, on future devices, such as ITER, which must operate at high density and at high plasma current, only a limited number of disruptions will be tolerable. For these reasons, disruptions in tokamaks must be avoided, but, when a disruption is unavoidable, minimizing its severity is mandatory. Therefore, finding appropriate mitigating actions to reduce the damage of the reactor components is accepted as fundamental objective in the fusion community. The physical phenomena that lead plasma to disrupt are non-linear and very complex. The present understanding of disruption physics has not gone so far as to provide an analytical model describing the onset of these instabilities and the main effort has been devoted to develop data-based methods. In the present thesis the development of a reliable disruption prediction system has been investigated using several data-based approaches, starting from the strengths and the drawbacks of the methods proposed in the literature. In fact, literature reports numerous studies for disruption prediction using data-based models, such as neural networks. Even if the results are encouraging, they are not sufficient to explain the intrinsic structure of the data used to describe the complex behavior of the plasma. Recent studies demonstrated the urgency of developing sophisticated control schemes that allow exploring the operating limits of tokamak in order to increase the reactor performance. For this reason, one of the goal of the present thesis is to identify and to develop tools for visualization and analysis of multidimensional data from numerous plasma diagnostics available in the database of the machine. The identification of the boundaries of the disruption free plasma parameter space would lead to an increase in the knowledge of disruptions. A viable approach to understand disruptive events consists of identifying the intrinsic structure of the data used to describe the plasma operational space. Manifold learning algorithms attempt to identify these structures in order to find a low-dimensional representation of the data. Data for this thesis comes from ASDEX Upgrade (AUG). ASDEX Upgrade is a medium size tokamak experiment located at IPP Max-Planck-Institut für Plasmaphysik, Garching bei München (Germany). At present it is the largest tokamak in Germany. Among the available methods the attention has been mainly devoted to data clustering techniques. Data clustering consists on grouping a set of data in such a way that data in the same group (cluster) are more similar to each other than those in other groups. Due to the inherent predisposition for visualization, the most popular and widely used clustering technique, the Self-Organizing Map (SOM), has been firstly investigated. The SOM allows to extract information from the multidimensional operational space of AUG using 7 plasma parameters coming from successfully terminated (safe) and disruption terminated (disrupted) pulses. Data to train and test the SOM have been extracted from AUG experiments performed between July 2002 and November 2009. The SOM allowed to display the AUG operational space and to identify regions with high risk of disruption (disruptive regions) and those with low risk of disruption (safe regions). In addition to space visualization purposes, the SOM can be used also to monitor the time evolution of the discharges during an experiment. Thus, the SOM has been used as disruption predictor by introducing a suitable criterion, based on the trend of the trajectories on the map throughout the different regions. When a plasma configuration with a high risk of disruption is recognized, a disruption alarm is triggered allowing to perform disruption avoidance or mitigation actions. The data-based models, such as the SOM, are affected by the so-called "ageing effect". The ageing effect consists in the degradation of the predictor performance during the time. It is due to the fact that, during the operation of the predictor, new data may come from experiments different from those used for the training. In order to reduce such effect, a retraining of the predictor has been proposed. The retraining procedure consists of a new training procedure performed adding to the training set the new plasma configurations coming from more recent experimental campaigns. This aims to supply the novel information to the model to increase the prediction performances of the predictor. Another drawback of the SOM, common to all the proposed data-based models in literature, is the need of a dedicated set of experiments terminated with a disruption to implement the predictive model. Indeed, future fusion devices, like ITER, will tolerate only a limited number of disruptive events and hence the disruption database won't be available. In order to overcome this shortcoming, a disruption prediction system for AUG built using only input signals from safe pulses has been implemented. The predictor model is based on a Fault Detection and Isolation (FDI) approach. FDI is an important and active research field which allows to monitor a system and to determine when a fault happens. The majority of model-based FDI procedures are based on a statistical analysis of residuals. Given an empirical model identified on a reference dataset, obtained under Normal Operating Conditions (NOC), the discrepancies between the new observations and those estimated by the NOCs (residuals) are calculated. The residuals are considered as a random process with known statistical properties. If a fault happens, a change of these properties is detected. In this thesis, the safe pulses are assumed as the normal operation conditions of the process and the disruptions are assumed as status of fault. Thus, only safe pulses are used to train the NOC model. In order to have a graphical representation of the trajectory of the pulses, only three plasma parameters have been used to build the NOC model. Monitoring the time evolution of the residuals by introducing an alarm criterion based on a suitable threshold on the residual values, the NOC model properly identifies an incoming disruption. Data for the training and the tests of the NOC model have been extracted from AUG experiments executed between July 2002 and November 2009. The assessment of a specific disruptive phase for each disruptive discharge represents a relevant issue in understanding the disruptive events. Up to now at AUG disruption precursors have been assumed appearing into a prefixed time window, the last 45ms for all disrupted discharges. The choice of such a fixed temporal window could limit the prediction performance. In fact, it generates ambiguous information in cases of disruptions with disruptive phase different from 45ms. In this thesis, the Mahalanobis distance is applied to define a specific disruptive phase for each disruption. In particular, a different length of the disruptive phase has been selected for each disrupted pulse in the training set by labeling each sample as safe or disruptive depending on its own Mahalanobis distance from the set of the safe discharges. Then, with this new training set, the operational space of AUG has been mapped using the Generative Topography Mapping (GTM). The GTM is inspired by the SOM algorithm, with the aim to overcome its limitations. The GTM has been investigated in order to identify regions with high risk of disruption and those with low risk of disruption. For comparison purposes a second SOM has been built. Hence, GTM and SOM have been tested as disruption predictors. Data for the training and the tests of the SOM and the GTM have been extracted from AUG experiments executed from May 2007 to November 2012. The last method studied and applied in this thesis has been the Logistic regression model (Logit). The logistic regression is a well-known statistic method to analyze problems with dichotomous dependent variables. In this study the Logit models the probability that a generic sample belongs to the non-disruptive or the disruptive phase. The time evolution of the Logit Model output (LMO) has been used as disruption proximity index by introducing a suitable threshold. Data for the training and the tests of the Logit models have been extracted from AUG experiments executed from May 2007 to November 2012. Disruptive samples have been selected through the Mahalanobis distance criterion. Finally, in order to interpret the behavior of data-based predictors, a manual classification of disruptions has been performed for experiments occurred from May 2007 to November 2012. The manual classification has been performed by means of a visual analysis of several plasma parameters for each disruption. Moreover, the specific chains of events have been detected and used to classify disruptions and when possible, the same classes introduced for JET are adopte

    Manifold learning techniques and statistical approaches applied to the disruption prediction in tokamaks

    Get PDF
    The nuclear fusion arises as the unique clean energy source capable to meet the energy needs of the entire world in the future. On present days, several experimental fusion devices are operating to optimize the fusion process, confining the plasma by means of magnetic fields. The goal of plasma confined in a magnetic field can be achieved by linear cylindrical configurations or toroidal configurations, e.g., stellarator, reverse field pinch, or tokamak. Among the explored magnetic confinement techniques, the tokamak configuration is to date considered the most reliable. Unfortunately, the tokamak is vulnerable to instabilities that, in the most severe cases, can lead to lose the magnetic confinement; this phenomenon is called disruption. Disruptions are dangerous and irreversible events for the device during which the plasma energy is suddenly released on the first wall components and vacuum vessel causing runaway electrons, large mechanical forces and intense thermal loads, which may cause severe damage to the vessel wall and the plasma face components. Present devices are designed to resist the disruptive events; for this reason, today, the disruptions are generally tolerable. Furthermore, one of their aims is the investigation of disruptive boundaries in the operational space. However, on future devices, such as ITER, which must operate at high density and at high plasma current, only a limited number of disruptions will be tolerable. For these reasons, disruptions in tokamaks must be avoided, but, when a disruption is unavoidable, minimizing its severity is mandatory. Therefore, finding appropriate mitigating actions to reduce the damage of the reactor components is accepted as fundamental objective in the fusion community. The physical phenomena that lead plasma to disrupt are non-linear and very complex. The present understanding of disruption physics has not gone so far as to provide an analytical model describing the onset of these instabilities and the main effort has been devoted to develop data-based methods. In the present thesis the development of a reliable disruption prediction system has been investigated using several data-based approaches, starting from the strengths and the drawbacks of the methods proposed in the literature. In fact, literature reports numerous studies for disruption prediction using data-based models, such as neural networks. Even if the results are encouraging, they are not sufficient to explain the intrinsic structure of the data used to describe the complex behavior of the plasma. Recent studies demonstrated the urgency of developing sophisticated control schemes that allow exploring the operating limits of tokamak in order to increase the reactor performance. For this reason, one of the goal of the present thesis is to identify and to develop tools for visualization and analysis of multidimensional data from numerous plasma diagnostics available in the database of the machine. The identification of the boundaries of the disruption free plasma parameter space would lead to an increase in the knowledge of disruptions. A viable approach to understand disruptive events consists of identifying the intrinsic structure of the data used to describe the plasma operational space. Manifold learning algorithms attempt to identify these structures in order to find a low-dimensional representation of the data. Data for this thesis comes from ASDEX Upgrade (AUG). ASDEX Upgrade is a medium size tokamak experiment located at IPP Max-Planck-Institut für Plasmaphysik, Garching bei München (Germany). At present it is the largest tokamak in Germany. Among the available methods the attention has been mainly devoted to data clustering techniques. Data clustering consists on grouping a set of data in such a way that data in the same group (cluster) are more similar to each other than those in other groups. Due to the inherent predisposition for visualization, the most popular and widely used clustering technique, the Self-Organizing Map (SOM), has been firstly investigated. The SOM allows to extract information from the multidimensional operational space of AUG using 7 plasma parameters coming from successfully terminated (safe) and disruption terminated (disrupted) pulses. Data to train and test the SOM have been extracted from AUG experiments performed between July 2002 and November 2009. The SOM allowed to display the AUG operational space and to identify regions with high risk of disruption (disruptive regions) and those with low risk of disruption (safe regions). In addition to space visualization purposes, the SOM can be used also to monitor the time evolution of the discharges during an experiment. Thus, the SOM has been used as disruption predictor by introducing a suitable criterion, based on the trend of the trajectories on the map throughout the different regions. When a plasma configuration with a high risk of disruption is recognized, a disruption alarm is triggered allowing to perform disruption avoidance or mitigation actions. The data-based models, such as the SOM, are affected by the so-called "ageing effect". The ageing effect consists in the degradation of the predictor performance during the time. It is due to the fact that, during the operation of the predictor, new data may come from experiments different from those used for the training. In order to reduce such effect, a retraining of the predictor has been proposed. The retraining procedure consists of a new training procedure performed adding to the training set the new plasma configurations coming from more recent experimental campaigns. This aims to supply the novel information to the model to increase the prediction performances of the predictor. Another drawback of the SOM, common to all the proposed data-based models in literature, is the need of a dedicated set of experiments terminated with a disruption to implement the predictive model. Indeed, future fusion devices, like ITER, will tolerate only a limited number of disruptive events and hence the disruption database won't be available. In order to overcome this shortcoming, a disruption prediction system for AUG built using only input signals from safe pulses has been implemented. The predictor model is based on a Fault Detection and Isolation (FDI) approach. FDI is an important and active research field which allows to monitor a system and to determine when a fault happens. The majority of model-based FDI procedures are based on a statistical analysis of residuals. Given an empirical model identified on a reference dataset, obtained under Normal Operating Conditions (NOC), the discrepancies between the new observations and those estimated by the NOCs (residuals) are calculated. The residuals are considered as a random process with known statistical properties. If a fault happens, a change of these properties is detected. In this thesis, the safe pulses are assumed as the normal operation conditions of the process and the disruptions are assumed as status of fault. Thus, only safe pulses are used to train the NOC model. In order to have a graphical representation of the trajectory of the pulses, only three plasma parameters have been used to build the NOC model. Monitoring the time evolution of the residuals by introducing an alarm criterion based on a suitable threshold on the residual values, the NOC model properly identifies an incoming disruption. Data for the training and the tests of the NOC model have been extracted from AUG experiments executed between July 2002 and November 2009. The assessment of a specific disruptive phase for each disruptive discharge represents a relevant issue in understanding the disruptive events. Up to now at AUG disruption precursors have been assumed appearing into a prefixed time window, the last 45ms for all disrupted discharges. The choice of such a fixed temporal window could limit the prediction performance. In fact, it generates ambiguous information in cases of disruptions with disruptive phase different from 45ms. In this thesis, the Mahalanobis distance is applied to define a specific disruptive phase for each disruption. In particular, a different length of the disruptive phase has been selected for each disrupted pulse in the training set by labeling each sample as safe or disruptive depending on its own Mahalanobis distance from the set of the safe discharges. Then, with this new training set, the operational space of AUG has been mapped using the Generative Topography Mapping (GTM). The GTM is inspired by the SOM algorithm, with the aim to overcome its limitations. The GTM has been investigated in order to identify regions with high risk of disruption and those with low risk of disruption. For comparison purposes a second SOM has been built. Hence, GTM and SOM have been tested as disruption predictors. Data for the training and the tests of the SOM and the GTM have been extracted from AUG experiments executed from May 2007 to November 2012. The last method studied and applied in this thesis has been the Logistic regression model (Logit). The logistic regression is a well-known statistic method to analyze problems with dichotomous dependent variables. In this study the Logit models the probability that a generic sample belongs to the non-disruptive or the disruptive phase. The time evolution of the Logit Model output (LMO) has been used as disruption proximity index by introducing a suitable threshold. Data for the training and the tests of the Logit models have been extracted from AUG experiments executed from May 2007 to November 2012. Disruptive samples have been selected through the Mahalanobis distance criterion. Finally, in order to interpret the behavior of data-based predictors, a manual classification of disruptions has been performed for experiments occurred from May 2007 to November 2012. The manual classification has been performed by means of a visual analysis of several plasma parameters for each disruption. Moreover, the specific chains of events have been detected and used to classify disruptions and when possible, the same classes introduced for JET are adopte

    Comparative Analysis of Student Learning: Technical, Methodological and Result Assessing of PISA-OECD and INVALSI-Italian Systems .

    Get PDF
    PISA is the most extensive international survey promoted by the OECD in the field of education, which measures the skills of fifteen-year-old students from more than 80 participating countries every three years. INVALSI are written tests carried out every year by all Italian students in some key moments of the school cycle, to evaluate the levels of some fundamental skills in Italian, Mathematics and English. Our comparison is made up to 2018, the last year of the PISA-OECD survey, even if INVALSI was carried out for the last edition in 2022. Our analysis focuses attention on the common part of the reference populations, which are the 15-year-old students of the 2nd class of secondary schools of II degree, where both sources give a similar picture of the students
    • …
    corecore