10,623 research outputs found

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Patching Weak Convolutional Neural Network Models through Modularization and Composition

    Full text link
    Despite great success in many applications, deep neural networks are not always robust in practice. For instance, a convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily in classifying some particular classes of objects. In this work, we are concerned with patching the weak part of a CNN model instead of improving it through the costly retraining of the entire model. Inspired by the fundamental concepts of modularization and composition in software engineering, we propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for NN-class classification into NN smaller CNN modules. Each module is a sub-model containing a part of the convolution kernels of the strong model. To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model. The ability of the weak CNN model to recognize the TC can thus be improved through patching. Moreover, the ability to recognize non-TCs is also improved, as the samples misclassified as TC could be classified as non-TCs correctly. Experimental results with two representative CNNs on three widely-used datasets show that the averaged improvement on the TC in terms of precision and recall are 12.54% and 2.14%, respectively. Moreover, patching improves the accuracy of non-TCs by 1.18%. The results demonstrate that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.Comment: Accepted at ASE'2

    SC-VAE: Sparse Coding-based Variational Autoencoder

    Full text link
    Learning rich data representations from unlabeled data is a key challenge towards applying deep learning algorithms in downstream supervised tasks. Several variants of variational autoencoders have been proposed to learn compact data representaitons by encoding high-dimensional data in a lower dimensional space. Two main classes of VAEs methods may be distinguished depending on the characteristics of the meta-priors that are enforced in the representation learning step. The first class of methods derives a continuous encoding by assuming a static prior distribution in the latent space. The second class of methods learns instead a discrete latent representation using vector quantization (VQ) along with a codebook. However, both classes of methods suffer from certain challenges, which may lead to suboptimal image reconstruction results. The first class of methods suffers from posterior collapse, whereas the second class of methods suffers from codebook collapse. To address these challenges, we introduce a new VAE variant, termed SC-VAE (sparse coding-based VAE), which integrates sparse coding within variational autoencoder framework. Instead of learning a continuous or discrete latent representation, the proposed method learns a sparse data representation that consists of a linear combination of a small number of learned atoms. The sparse coding problem is solved using a learnable version of the iterative shrinkage thresholding algorithm (ISTA). Experiments on two image datasets demonstrate that our model can achieve improved image reconstruction results compared to state-of-the-art methods. Moreover, the use of learned sparse code vectors allows us to perform downstream task like coarse image segmentation through clustering image patches.Comment: 15 pages, 11 figures, and 3 table

    Transcriptional networks of transient cell states during human prefrontal cortex development

    Get PDF
    The human brain is divided into various anatomical regions that control and coordinate unique functions. The prefrontal cortex (PFC) is a large brain region that comprises a range of neuronal and non-neuronal cell types, sharing extensive interconnections with subcortical areas, and plays a critical role in cognition and memory. A timely appearance of distinct cell types through embryonic development is crucial for an anatomically perfect and functional brain. Direct tracing of cell fate development in the human brain is not possible, but single-cell transcriptome sequencing (scRNA-seq) datasets provide the opportunity to dissect cellular heterogeneity and its molecular regulators. Here, using scRNA-seq data of human PFC from fetal stages, we elucidate distinct transient cell states during PFC development and their underlying gene regulatory circuitry. We further identified that distinct intermediate cell states consist of specific gene regulatory modules essential to reach terminal fate using discrete developmental paths. Moreover, using in silico gene knock-out and over-expression analysis, we validated crucial gene regulatory components during the lineage specification of oligodendrocyte progenitor cells. Our study illustrates unique intermediate states and specific gene interaction networks that warrant further investigation for their functional contribution to typical brain development and discusses how this knowledge can be harvested for therapeutic intervention in challenging neurodevelopmental disorders

    Augmented classification for electrical coil winding defects

    Get PDF
    A green revolution has accelerated over the recent decades with a look to replace existing transportation power solutions through the adoption of greener electrical alternatives. In parallel the digitisation of manufacturing has enabled progress in the tracking and traceability of processes and improvements in fault detection and classification. This paper explores electrical machine manufacture and the challenges faced in identifying failures modes during this life cycle through the demonstration of state-of-the-art machine vision methods for the classification of electrical coil winding defects. We demonstrate how recent generative adversarial networks can be used to augment training of these models to further improve their accuracy for this challenging task. Our approach utilises pre-processing and dimensionality reduction to boost performance of the model from a standard convolutional neural network (CNN) leading to a significant increase in accuracy

    Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond

    Full text link
    [ES] Esta tesis se enmarca en la intersección entre las técnicas modernas de Machine Learning, como las Redes Neuronales Profundas, y el modelado probabilístico confiable. En muchas aplicaciones, no solo nos importa la predicción hecha por un modelo (por ejemplo esta imagen de pulmón presenta cáncer) sino también la confianza que tiene el modelo para hacer esta predicción (por ejemplo esta imagen de pulmón presenta cáncer con 67% probabilidad). En tales aplicaciones, el modelo ayuda al tomador de decisiones (en este caso un médico) a tomar la decisión final. Como consecuencia, es necesario que las probabilidades proporcionadas por un modelo reflejen las proporciones reales presentes en el conjunto al que se ha asignado dichas probabilidades; de lo contrario, el modelo es inútil en la práctica. Cuando esto sucede, decimos que un modelo está perfectamente calibrado. En esta tesis se exploran tres vias para proveer modelos más calibrados. Primero se muestra como calibrar modelos de manera implicita, que son descalibrados por técnicas de aumentación de datos. Se introduce una función de coste que resuelve esta descalibración tomando como partida las ideas derivadas de la toma de decisiones con la regla de Bayes. Segundo, se muestra como calibrar modelos utilizando una etapa de post calibración implementada con una red neuronal Bayesiana. Finalmente, y en base a las limitaciones estudiadas en la red neuronal Bayesiana, que hipotetizamos que se basan en un prior mispecificado, se introduce un nuevo proceso estocástico que sirve como distribución a priori en un problema de inferencia Bayesiana.[CA] Aquesta tesi s'emmarca en la intersecció entre les tècniques modernes de Machine Learning, com ara les Xarxes Neuronals Profundes, i el modelatge probabilístic fiable. En moltes aplicacions, no només ens importa la predicció feta per un model (per ejemplem aquesta imatge de pulmó presenta càncer) sinó també la confiança que té el model per fer aquesta predicció (per exemple aquesta imatge de pulmó presenta càncer amb 67% probabilitat). En aquestes aplicacions, el model ajuda el prenedor de decisions (en aquest cas un metge) a prendre la decisió final. Com a conseqüència, cal que les probabilitats proporcionades per un model reflecteixin les proporcions reals presents en el conjunt a què s'han assignat aquestes probabilitats; altrament, el model és inútil a la pràctica. Quan això passa, diem que un model està perfectament calibrat. En aquesta tesi s'exploren tres vies per proveir models més calibrats. Primer es mostra com calibrar models de manera implícita, que són descalibrats per tècniques d'augmentació de dades. S'introdueix una funció de cost que resol aquesta descalibració prenent com a partida les idees derivades de la presa de decisions amb la regla de Bayes. Segon, es mostra com calibrar models utilitzant una etapa de post calibratge implementada amb una xarxa neuronal Bayesiana. Finalment, i segons les limitacions estudiades a la xarxa neuronal Bayesiana, que es basen en un prior mispecificat, s'introdueix un nou procés estocàstic que serveix com a distribució a priori en un problema d'inferència Bayesiana.[EN] This thesis is framed at the intersection between modern Machine Learning techniques, such as Deep Neural Networks, and reliable probabilistic modeling. In many machine learning applications, we do not only care about the prediction made by a model (e.g. this lung image presents cancer) but also in how confident is the model in making this prediction (e.g. this lung image presents cancer with 67% probability). In such applications, the model assists the decision-maker (in this case a doctor) towards making the final decision. As a consequence, one needs that the probabilities provided by a model reflects the true underlying set of outcomes, otherwise the model is useless in practice. When this happens, we say that a model is perfectly calibrated. In this thesis three ways are explored to provide more calibrated models. First, it is shown how to calibrate models implicitly, which are decalibrated by data augmentation techniques. A cost function is introduced that solves this decalibration taking as a starting point the ideas derived from decision making with Bayes' rule. Second, it shows how to calibrate models using a post-calibration stage implemented with a Bayesian neural network. Finally, and based on the limitations studied in the Bayesian neural network, which we hypothesize that came from a mispecified prior, a new stochastic process is introduced that serves as a priori distribution in a Bayesian inference problem.Maroñas Molano, J. (2022). Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181582TESI

    Predictive Maintenance of Critical Equipment for Floating Liquefied Natural Gas Liquefaction Process

    Get PDF
    Predictive Maintenance of Critical Equipment for Liquefied Natural Gas Liquefaction Process Meeting global energy demand is a massive challenge, especially with the quest of more affinity towards sustainable and cleaner energy. Natural gas is viewed as a bridge fuel to a renewable energy. LNG as a processed form of natural gas is the fastest growing and cleanest form of fossil fuel. Recently, the unprecedented increased in LNG demand, pushes its exploration and processing into offshore as Floating LNG (FLNG). The offshore topsides gas processes and liquefaction has been identified as one of the great challenges of FLNG. Maintaining topside liquefaction process asset such as gas turbine is critical to profitability and reliability, availability of the process facilities. With the setbacks of widely used reactive and preventive time-based maintenances approaches, to meet the optimal reliability and availability requirements of oil and gas operators, this thesis presents a framework driven by AI-based learning approaches for predictive maintenance. The framework is aimed at leveraging the value of condition-based maintenance to minimises the failures and downtimes of critical FLNG equipment (Aeroderivative gas turbine). In this study, gas turbine thermodynamics were introduced, as well as some factors affecting gas turbine modelling. Some important considerations whilst modelling gas turbine system such as modelling objectives, modelling methods, as well as approaches in modelling gas turbines were investigated. These give basis and mathematical background to develop a gas turbine simulated model. The behaviour of simple cycle HDGT was simulated using thermodynamic laws and operational data based on Rowen model. Simulink model is created using experimental data based on Rowen’s model, which is aimed at exploring transient behaviour of an industrial gas turbine. The results show the capability of Simulink model in capture nonlinear dynamics of the gas turbine system, although constraint to be applied for further condition monitoring studies, due to lack of some suitable relevant correlated features required by the model. AI-based models were found to perform well in predicting gas turbines failures. These capabilities were investigated by this thesis and validated using an experimental data obtained from gas turbine engine facility. The dynamic behaviours gas turbines changes when exposed to different varieties of fuel. A diagnostics-based AI models were developed to diagnose different gas turbine engine’s failures associated with exposure to various types of fuels. The capabilities of Principal Component Analysis (PCA) technique have been harnessed to reduce the dimensionality of the dataset and extract good features for the diagnostics model development. Signal processing-based (time-domain, frequency domain, time-frequency domain) techniques have also been used as feature extraction tools, and significantly added more correlations to the dataset and influences the prediction results obtained. Signal processing played a vital role in extracting good features for the diagnostic models when compared PCA. The overall results obtained from both PCA, and signal processing-based models demonstrated the capabilities of neural network-based models in predicting gas turbine’s failures. Further, deep learning-based LSTM model have been developed, which extract features from the time series dataset directly, and hence does not require any feature extraction tool. The LSTM model achieved the highest performance and prediction accuracy, compared to both PCA-based and signal processing-based the models. In summary, it is concluded from this thesis that despite some challenges related to gas turbines Simulink Model for not being integrated fully for gas turbine condition monitoring studies, yet data-driven models have proven strong potentials and excellent performances on gas turbine’s CBM diagnostics. The models developed in this thesis can be used for design and manufacturing purposes on gas turbines applied to FLNG, especially on condition monitoring and fault detection of gas turbines. The result obtained would provide valuable understanding and helpful guidance for researchers and practitioners to implement robust predictive maintenance models that will enhance the reliability and availability of FLNG critical equipment.Petroleum Technology Development Funds (PTDF) Nigeri

    Innate immunity and metabolism in the bovine ovarian follicle

    Get PDF
    Postpartum uterine disease in dairy cows is associated with reduced fertility. One of the first and most prevalent bacteria associated with uterine disease is Escherichia coli. The bacterial endotoxin, lipopolysaccharide (LPS), accumulates in the ovarian follicular fluid of animals with uterine disease. The granulosa cells of the ovarian follicle respond to LPS by secreting pro-inflammatory cytokines, such as interleukin (IL)-1a, IL-1b and IL-8, and oocyte health is perturbed. Dairy cows also experience metabolic energy stress in the postpartum period, which is associated with an increased risk of developing uterine disease and ovarian dysfunction. This thesis explored the crosstalk between innate immunity and metabolic energy stress in bovine granulosa cells and cumulus-oocyte complex. Firstly, we found that glycolysis, AMP-activated protein kinase and the mechanistic target of rapamycin, regulate the innate immune responses to LPS in granulosa cells isolated from bovine ovarian follicles. Activation of AMP-activated protein kinase decreased the LPS-induced secretion of IL-1a, IL-1b, and IL8, and was associated with shortened duration of ERK1/2 and JNK phosphorylation. Next, we found that decreasing the availability of cholesterol or inhibiting cholesterol biosynthesis using short-interfering RNA impaired the LPS-induced secretion of IL-1a and IL-1b by granulosa cells. Furthermore, metabolic energy stress or inhibiting cholesterol biosynthesis in the bovine cumulus-oocyte complex modulated the innate immune responses to LPS, and perturbed meiotic progression during in vitro maturation. Finally, we explored an in vivo model of uterine disease in heifers, using RNAseq to investigate alterations to the transcriptome of the reproductive tract. We found that uterine disease altered the transcriptome of the endometrium, oviduct, granulosa cells and oocyte, several months after bacterial infusion; these changes were most evident in the granulosa cells and oocyte of the ovarian follicle. The findings from this thesis imply that there is crosstalk between innate immunity and metabolism in the bovine ovarian follicle

    Annals [...].

    Get PDF
    Pedometrics: innovation in tropics; Legacy data: how turn it useful?; Advances in soil sensing; Pedometric guidelines to systematic soil surveys.Evento online. Coordenado por: Waldir de Carvalho Junior, Helena Saraiva Koenow Pinheiro, Ricardo Simão Diniz Dalmolin
    corecore