453 research outputs found

    Multi-tier framework for the inferential measurement and data-driven modeling

    Get PDF
    A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic

    Interpretability-oriented data-driven modelling of bladder cancer via computational intelligence

    Get PDF

    Machine learning in critical care: state-of-the-art and a sepsis case study

    Get PDF
    Background: Like other scientific fields, such as cosmology, high-energy physics, or even the life sciences, medicine and healthcare face the challenge of an extremely quick transformation into data-driven sciences. This challenge entails the daunting task of extracting usable knowledge from these data using algorithmic methods. In the medical context this may for instance realized through the design of medical decision support systems for diagnosis, prognosis and patient management. The intensive care unit (ICU), and by extension the whole area of critical care, is becoming one of the most data-driven clinical environments. Results: The increasing availability of complex and heterogeneous data at the point of patient attention in critical care environments makes the development of fresh approaches to data analysis almost compulsory. Computational Intelligence (CI) and Machine Learning (ML) methods can provide such approaches and have already shown their usefulness in addressing problems in this context. The current study has a dual goal: it is first a review of the state-of-the-art on the use and application of such methods in the field of critical care. Such review is presented from the viewpoint of the different subfields of critical care, but also from the viewpoint of the different available ML and CI techniques. The second goal is presenting a collection of results that illustrate the breath of possibilities opened by ML and CI methods using a single problem, the investigation of septic shock at the ICU. Conclusion: We have presented a structured state-of-the-art that illustrates the broad-ranging ways in which ML and CI methods can make a difference in problems affecting the manifold areas of critical care. The potential of ML and CI has been illustrated in detail through an example concerning the sepsis pathology. The new definitions of sepsis and the relevance of using the systemic inflammatory response syndrome (SIRS) in its diagnosis have been considered. Conditional independence models have been used to address this problem, showing that SIRS depends on both organ dysfunction measured through the Sequential Organ Failure (SOFA) score and the ICU outcome, thus concluding that SIRS should still be considered in the study of the pathophysiology of Sepsis. Current assessment of the risk of dead at the ICU lacks specificity. ML and CI techniques are shown to improve the assessment using both indicators already in place and other clinical variables that are routinely measured. Kernel methods in particular are shown to provide the best performance balance while being amenable to representation through graphical models, which increases their interpretability and, with it, their likelihood to be accepted in medical practice.Peer ReviewedPostprint (published version

    Sleep Stage Classification: A Deep Learning Approach

    Get PDF
    Sleep occupies significant part of human life. The diagnoses of sleep related disorders are of great importance. To record specific physical and electrical activities of the brain and body, a multi-parameter test, called polysomnography (PSG), is normally used. The visual process of sleep stage classification is time consuming, subjective and costly. To improve the accuracy and efficiency of the sleep stage classification, automatic classification algorithms were developed. In this research work, we focused on pre-processing (filtering boundaries and de-noising algorithms) and classification steps of automatic sleep stage classification. The main motivation for this work was to develop a pre-processing and classification framework to clean the input EEG signal without manipulating the original data thus enhancing the learning stage of deep learning classifiers. For pre-processing EEG signals, a lossless adaptive artefact removal method was proposed. Rather than other works that used artificial noise, we used real EEG data contaminated with EOG and EMG for evaluating the proposed method. The proposed adaptive algorithm led to a significant enhancement in the overall classification accuracy. In the classification area, we evaluated the performance of the most common sleep stage classifiers using a comprehensive set of features extracted from PSG signals. Considering the challenges and limitations of conventional methods, we proposed two deep learning-based methods for classification of sleep stages based on Stacked Sparse AutoEncoder (SSAE) and Convolutional Neural Network (CNN). The proposed methods performed more efficiently by eliminating the need for conventional feature selection and feature extraction steps respectively. Moreover, although our systems were trained with lower number of samples compared to the similar studies, they were able to achieve state of art accuracy and higher overall sensitivity

    Novel analysis–forecast system based on multi-objective optimization for air quality index

    Full text link
    © 2018 Elsevier Ltd The air quality index (AQI) is an important indicator of air quality. Owing to the randomness and non-stationarity inherent in AQI, it is still a challenging task to establish a reasonable analysis–forecast system for AQI. Previous studies primarily focused on enhancing either forecasting accuracy or stability and failed to improve both aspects simultaneously, leading to unsatisfactory results. In this study, a novel analysis–forecast system is proposed that consists of complexity analysis, data preprocessing, and optimize–forecast modules and addresses the problems of air quality monitoring. The proposed system performs a complexity analysis of the original series based on sample entropy and data preprocessing using a novel feature selection model that integrates a decomposition technique and an optimization algorithm for removing noise and selecting the optimal input structure, and then forecasts hourly AQI series by utilizing a modified least squares support vector machine optimized by a multi-objective multi-verse optimization algorithm. Experiments based on datasets from eight major cities in China demonstrated that the proposed system can simultaneously obtain high accuracy and strong stability and is thus efficient and reliable for air quality monitoring

    Advances in transfer learning methods based on computational intelligence

    Get PDF
    Traditional machine learning and data mining have made tremendous progress in many knowledge-based areas, such as clustering, classification, and regression. However, the primary assumption in all of these areas is that the training and testing data should be in the same domain and have the same distribution. This assumption is difficult to achieve in real-world applications due to the limited availability of labeled data. Associated data in different domains can be used to expand the availability of prior knowledge about future target data. In recent years, transfer learning has been used to address such cross-domain learning problems by using information from data in a related domain and transferring that data to the target task. The transfer learning methodology is utilized in this work with unsupervised and supervised learning methods. For unsupervised learning, a novel transfer-learning possibilistic c-means (TLPCM) algorithm is proposed to handle the PCM clustering problem in a domain that has insufficient data. Moreover, TLPCM overcomes the problem of differing numbers of clusters between the source and target domains. The proposed algorithm employs the historical cluster centers of the source data as a reference to guide the clustering of the target data. The experimental studies presented here were thoroughly evaluated, and they demonstrate the advantages of TLPCM in both synthetic and real-world transfer datasets. For supervised learning, a transfer learning (TL) technique is used to pre-train a CNN model on posture data and then fine-tune it on the sleep stage data. We used a ballistocardiography (BCG) bed sensor to collect both posture and sleep stage data to provide a non-invasive, in-home monitoring system that tracks changes in the subjects' health over time. The quality of sleep has a significant impact on health and life. This study adopts a hierarchical and none-hierarchical classification structure to develop an automatic sleep stage classification system using ballistocardiogram (BCG) signals. A leave-one-subject-out cross-validation (LOSO-CV) procedure is used for testing classification performance in most of the experiments. Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Deep Neural Networks DNNs are complementary in their modeling capabilities, while CNNs have the advantage of reducing frequency variations, LSTMs are good at temporal modeling. Polysomnography (PSG) data from a sleep lab was used as the ground truth for sleep stages, with the emphasis on three sleep stages, specifically, awake, rapid eye movement (REM), and non-REM sleep (NREM). Moreover, a transfer learning approach is employed with supervised learning to address the cross-resident training problem to predict early illness. We validate our method by conducting a retrospective study on three residents from TigerPlace, a retirement community in Columbia, MO, where apartments are fitted with wireless networks of motion and bed sensors. Predicting the early signs of illness in older adults by using a continuous, unobtrusive nursing home monitoring system has been shown to increase the quality of life and decrease care costs. Illness prediction is based on sensor data and uses algorithms such as support vector machine (SVM) and k-nearest neighbors (kNN). One of the most significant challenges related to the development of prediction algorithms for sensor networks is the use of knowledge from previous residents to predict new ones' behaviors. Each day, the presence or absence of illness was manually evaluated using nursing visit reports from a homegrown electronic medical record (EMR) system. In this work, the transfer learning SVM approach outperformed three other methods, i.e., regular SVM, one-class SVM, and one-class kNN.Includes bibliographical references (pages 114-127)
    corecore