645 research outputs found

    Predicting financial distress of JSE-Listed companies using Bayesian networks

    Get PDF
    This study aims to test the suitability of using Bayesian probabilistic models to predict bankruptcy of JSE-listed companies. A sample of 132 companies is considered with fourteen years of financial statement information and macroeconomic indicators used as predictor variables. Various permutations of Bayesian models are tested relating to different learning algorithms, intervals of discretisation and scoring metrics. In contrast to previous research, we explore a variety of evaluation measures and it is found that predictive accuracy for bankrupt firms does not exceed 70% in any model augmentation. On comparison to other popular models such as the Altman Z-score and the logit model, it is found that Bayesian networks produce marginally better predictive accuracy. Furthermore, a comparison to previous research on the same subject is carried and reasons for significantly different results are considered. Finally, the reasons for low predictive accuracies is considered with issues relating specifically to South Africa being discussed

    Making the most of machine learning and freely available datasets: a deforestation case study

    Get PDF

    Predictive Maintenance of an External Gear Pump using Machine Learning Algorithms

    Get PDF
    The importance of Predictive Maintenance is critical for engineering industries, such as manufacturing, aerospace and energy. Unexpected failures cause unpredictable downtime, which can be disruptive and high costs due to reduced productivity. This forces industries to ensure the reliability of their equip-ment. In order to increase the reliability of equipment, maintenance actions, such as repairs, replacements, equipment updates, and corrective actions are employed. These actions affect the flexibility, quality of operation and manu-facturing time. It is therefore essential to plan maintenance before failure occurs.Traditional maintenance techniques rely on checks conducted routinely based on running hours of the machine. The drawback of this approach is that maintenance is sometimes performed before it is required. Therefore, conducting maintenance based on the actual condition of the equipment is the optimal solu-tion. This requires collecting real-time data on the condition of the equipment, using sensors (to detect events and send information to computer processor).Predictive Maintenance uses these types of techniques or analytics to inform about the current, and future state of the equipment. In the last decade, with the introduction of the Internet of Things (IoT), Machine Learning (ML), cloud computing and Big Data Analytics, manufacturing industry has moved forward towards implementing Predictive Maintenance, resulting in increased uptime and quality control, optimisation of maintenance routes, improved worker safety and greater productivity.The present thesis describes a novel computational strategy of Predictive Maintenance (fault diagnosis and fault prognosis) with ML and Deep Learning applications for an FG304 series external gear pump, also known as a domino pump. In the absence of a comprehensive set of experimental data, synthetic data generation techniques are implemented for Predictive Maintenance by perturbing the frequency content of time series generated using High-Fidelity computational techniques. In addition, various types of feature extraction methods considered to extract most discriminatory informations from the data. For fault diagnosis, three types of ML classification algorithms are employed, namely Multilayer Perceptron (MLP), Support Vector Machine (SVM) and Naive Bayes (NB) algorithms. For prognosis, ML regression algorithms, such as MLP and SVM, are utilised. Although significant work has been reported by previous authors, it remains difficult to optimise the choice of hyper-parameters (important parameters whose value is used to control the learning process) for each specific ML algorithm. For instance, the type of SVM kernel function or the selection of the MLP activation function and the optimum number of hidden layers (and neurons).It is widely understood that the reliability of ML algorithms is strongly depen-dent upon the existence of a sufficiently large quantity of high-quality training data. In the present thesis, due to the unavailability of experimental data, a novel high-fidelity in-silico dataset is generated via a Computational Fluid Dynamic (CFD) model, which has been used for the training of the underlying ML metamodel. In addition, a large number of scenarios are recreated, ranging from healthy to faulty ones (e.g. clogging, radial gap variations, axial gap variations, viscosity variations, speed variations). Furthermore, the high-fidelity dataset is re-enacted by using degradation functions to predict the remaining useful life (fault prognosis) of an external gear pump.The thesis explores and compares the performance of MLP, SVM and NB algo-rithms for fault diagnosis and MLP and SVM for fault prognosis. In order to enable fast training and reliable testing of the MLP algorithm, some predefined network architectures, like 2n neurons per hidden layer, are used to speed up the identification of the precise number of neurons (shown to be useful when the sample data set is sufficiently large). Finally, a series of benchmark tests are presented, enabling to conclude that for fault diagnosis, the use of wavelet features and a MLP algorithm can provide the best accuracy, and the MLP al-gorithm provides the best prediction results for fault prognosis. In addition, benchmark examples are simulated to demonstrate the mesh convergence for the CFD model whereas, quantification analysis and noise influence on training data are performed for ML algorithms

    Application of Bayesian network including Microcystis morphospecies for microcystin risk assessment in three cyanobacterial bloom-plagued lakes, China

    Get PDF
    Microcystis spp., which occur as colonies of different sizes under natural conditions, have expanded in temperate and tropical freshwater ecosystems and caused seriously environmental and ecological problems. In the current study, a Bayesian network (BN) framework was developed to access the probability of microcystins (MCs) risk in large shallow eutrophic lakes in China, namely, Taihu Lake, Chaohu Lake, and Dianchi Lake. By means of a knowledge-supported way, physicochemical factors, Microcystis morphospecies, and MCs were integrated into different network structures. The sensitive analysis illustrated that Microcystis aeruginosa biomass was overall the best predictor of MCs risk, and its high biomass relied on the combined condition that water temperature exceeded 24 °C and total phosphorus was above 0.2 mg/L. Simulated scenarios suggested that the probability of hazardous MCs (≥1.0 μg/L) was higher under interactive effect of temperature increase and nutrients (nitrogen and phosphorus) imbalance than that of warming alone. Likewise, data-driven model development using a naïve Bayes classifier and equal frequency discretization resulted in a substantial technical performance (CCI = 0.83, K = 0.60), but the performance significantly decreased when model excluded species-specific biomasses from input variables (CCI = 0.76, K = 0.40). The BN framework provided a useful screening tool to evaluate cyanotoxin in three studied lakes in China, and it can also be used in other lakes suffering from cyanobacterial blooms dominated by Microcystis

    Bayesian correlated clustering to integrate multiple datasets

    Get PDF
    Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

    Issues in predictive modeling of individual customer behavior : applications in targeted marketing and consumer credit scoring

    Get PDF

    Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods

    Get PDF
    Models with intractable likelihood functions arise in areas including network analysis and spatial statistics, especially those involving Gibbs random fields. Posterior parameter es timation in these settings is termed a doubly-intractable problem because both the likelihood function and the posterior distribution are intractable. The comparison of Bayesian models is often based on the statistical evidence, the integral of the un-normalised posterior distribution over the model parameters which is rarely available in closed form. For doubly-intractable models, estimating the evidence adds another layer of difficulty. Consequently, the selection of the model that best describes an observed network among a collection of exponential random graph models for network analysis is a daunting task. Pseudolikelihoods offer a tractable approximation to the likelihood but should be treated with caution because they can lead to an unreasonable inference. This paper specifies a method to adjust pseudolikelihoods in order to obtain a reasonable, yet tractable, approximation to the likelihood. This allows implementation of widely used computational methods for evidence estimation and pursuit of Bayesian model selection of exponential random graph models for the analysis of social networks. Empirical comparisons to existing methods show that our procedure yields similar evidence estimates, but at a lower computational cost.Comment: Supplementary material attached. To view attachments, please download and extract the gzzipped source file listed under "Other formats
    corecore