Search CORE

645 research outputs found

Ranked nodes: A simple and effective way to model qualitative judgements in large-scale Bayesian Networks

Author: Fenton Norman
Neil Martin
Publication venue
Publication date: 30/12/2013
Field of study

Predicting financial distress of JSE-Listed companies using Bayesian networks

Author: Cassim Ziyad
Publication venue: Division of Actuarial Science
Publication date: 01/01/2016
Field of study

This study aims to test the suitability of using Bayesian probabilistic models to predict bankruptcy of JSE-listed companies. A sample of 132 companies is considered with fourteen years of financial statement information and macroeconomic indicators used as predictor variables. Various permutations of Bayesian models are tested relating to different learning algorithms, intervals of discretisation and scoring metrics. In contrast to previous research, we explore a variety of evaluation measures and it is found that predictive accuracy for bankrupt firms does not exceed 70% in any model augmentation. On comparison to other popular models such as the Altman Z-score and the logit model, it is found that Bayesian networks produce marginally better predictive accuracy. Furthermore, a comparison to previous research on the same subject is carried and reasons for significantly different results are considered. Finally, the reasons for low predictive accuracies is considered with issues relating specifically to South Africa being discussed

Cape Town University OpenUCT

Making the most of machine learning and freely available datasets: a deforestation case study

Author: Mayfield Helen
Publication venue: 'University of Queensland Library'
Publication date: 06/11/2015
Field of study

University of Queensland eSpace

Predictive Maintenance of an External Gear Pump using Machine Learning Algorithms

Author: KAYALVIZHI LAKSHMANAN
Publication venue: 'Swansea University'
Publication date: 01/01/2021
Field of study

The importance of Predictive Maintenance is critical for engineering industries, such as manufacturing, aerospace and energy. Unexpected failures cause unpredictable downtime, which can be disruptive and high costs due to reduced productivity. This forces industries to ensure the reliability of their equip-ment. In order to increase the reliability of equipment, maintenance actions, such as repairs, replacements, equipment updates, and corrective actions are employed. These actions aﬀect the ﬂexibility, quality of operation and manu-facturing time. It is therefore essential to plan maintenance before failure occurs.Traditional maintenance techniques rely on checks conducted routinely based on running hours of the machine. The drawback of this approach is that maintenance is sometimes performed before it is required. Therefore, conducting maintenance based on the actual condition of the equipment is the optimal solu-tion. This requires collecting real-time data on the condition of the equipment, using sensors (to detect events and send information to computer processor).Predictive Maintenance uses these types of techniques or analytics to inform about the current, and future state of the equipment. In the last decade, with the introduction of the Internet of Things (IoT), Machine Learning (ML), cloud computing and Big Data Analytics, manufacturing industry has moved forward towards implementing Predictive Maintenance, resulting in increased uptime and quality control, optimisation of maintenance routes, improved worker safety and greater productivity.The present thesis describes a novel computational strategy of Predictive Maintenance (fault diagnosis and fault prognosis) with ML and Deep Learning applications for an FG304 series external gear pump, also known as a domino pump. In the absence of a comprehensive set of experimental data, synthetic data generation techniques are implemented for Predictive Maintenance by perturbing the frequency content of time series generated using High-Fidelity computational techniques. In addition, various types of feature extraction methods considered to extract most discriminatory informations from the data. For fault diagnosis, three types of ML classiﬁcation algorithms are employed, namely Multilayer Perceptron (MLP), Support Vector Machine (SVM) and Naive Bayes (NB) algorithms. For prognosis, ML regression algorithms, such as MLP and SVM, are utilised. Although signiﬁcant work has been reported by previous authors, it remains diﬃcult to optimise the choice of hyper-parameters (important parameters whose value is used to control the learning process) for each speciﬁc ML algorithm. For instance, the type of SVM kernel function or the selection of the MLP activation function and the optimum number of hidden layers (and neurons).It is widely understood that the reliability of ML algorithms is strongly depen-dent upon the existence of a suﬃciently large quantity of high-quality training data. In the present thesis, due to the unavailability of experimental data, a novel high-ﬁdelity in-silico dataset is generated via a Computational Fluid Dynamic (CFD) model, which has been used for the training of the underlying ML metamodel. In addition, a large number of scenarios are recreated, ranging from healthy to faulty ones (e.g. clogging, radial gap variations, axial gap variations, viscosity variations, speed variations). Furthermore, the high-ﬁdelity dataset is re-enacted by using degradation functions to predict the remaining useful life (fault prognosis) of an external gear pump.The thesis explores and compares the performance of MLP, SVM and NB algo-rithms for fault diagnosis and MLP and SVM for fault prognosis. In order to enable fast training and reliable testing of the MLP algorithm, some predeﬁned network architectures, like 2n neurons per hidden layer, are used to speed up the identiﬁcation of the precise number of neurons (shown to be useful when the sample data set is suﬃciently large). Finally, a series of benchmark tests are presented, enabling to conclude that for fault diagnosis, the use of wavelet features and a MLP algorithm can provide the best accuracy, and the MLP al-gorithm provides the best prediction results for fault prognosis. In addition, benchmark examples are simulated to demonstrate the mesh convergence for the CFD model whereas, quantiﬁcation analysis and noise inﬂuence on training data are performed for ML algorithms

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Cronfa at Swansea University

Application of Bayesian network including Microcystis morphospecies for microcystin risk assessment in three cyanobacterial bloom-plagued lakes, China

Author: Li Lin
Shan Kun
Shang Mingsheng
Song Lirong
Wang Xiaoxiao
Yang Hong
Zhou Botian
Publication venue: 'Elsevier BV'
Publication date: 01/03/2019
Field of study

Microcystis spp., which occur as colonies of different sizes under natural conditions, have expanded in temperate and tropical freshwater ecosystems and caused seriously environmental and ecological problems. In the current study, a Bayesian network (BN) framework was developed to access the probability of microcystins (MCs) risk in large shallow eutrophic lakes in China, namely, Taihu Lake, Chaohu Lake, and Dianchi Lake. By means of a knowledge-supported way, physicochemical factors, Microcystis morphospecies, and MCs were integrated into different network structures. The sensitive analysis illustrated that Microcystis aeruginosa biomass was overall the best predictor of MCs risk, and its high biomass relied on the combined condition that water temperature exceeded 24 °C and total phosphorus was above 0.2 mg/L. Simulated scenarios suggested that the probability of hazardous MCs (≥1.0 μg/L) was higher under interactive effect of temperature increase and nutrients (nitrogen and phosphorus) imbalance than that of warming alone. Likewise, data-driven model development using a naïve Bayes classifier and equal frequency discretization resulted in a substantial technical performance (CCI = 0.83, K = 0.60), but the performance significantly decreased when model excluded species-specific biomasses from input variables (CCI = 0.76, K = 0.40). The BN framework provided a useful screening tool to evaluate cyanotoxin in three studied lakes in China, and it can also be used in other lakes suffering from cyanobacterial blooms dominated by Microcystis

Central Archive at the University of Reading

Institute of Hydrobiology, Chinese Academy Of Sciences

Bayesian correlated clustering to integrate multiple datasets

Author: Balasubramanian
Barash
Brock
Carlson
Cheng
Cherry
Cho
Cooke
Datta
David L. Wild
Dempster
Friedman
Fritsch
Granovskaia
Green
Harbison
Hubert
Huttenhower
Ideker
Ishwaran
Jackson
Jackson
Jansen
Jim E. Griffin
Kirk
Lee
Liu
Liu
Lockhart
Mistry
Myers
Myers
Neal
Neal
Nieto-Barajas
Paul Kirk
Puig
Rand
Rasmussen
Rasmussen
Reiss
Rhodes
Richard S. Savage
Rigaut
Rogers
Rogers
Rousseau
Santisteban
Savage
Schena
Shen
Solomon
Stark
Suchard
Troyanskaya
Wei
Wong
Yeung
Yuan
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

CiteSeerX

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

Issues in predictive modeling of individual customer behavior : applications in targeted marketing and consumer credit scoring

Author: Verstraeten Geert
Publication venue: Ghent University. Faculty of Economics and Business Administration
Publication date: 01/01/2005
Field of study

Ghent University Academic Bibliography

Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods

Author: Bouranis Lampros
Friel Nial
Maire Florian
Publication venue: 'Informa UK Limited'
Publication date: 19/10/2017
Field of study

Models with intractable likelihood functions arise in areas including network analysis and spatial statistics, especially those involving Gibbs random fields. Posterior parameter es timation in these settings is termed a doubly-intractable problem because both the likelihood function and the posterior distribution are intractable. The comparison of Bayesian models is often based on the statistical evidence, the integral of the un-normalised posterior distribution over the model parameters which is rarely available in closed form. For doubly-intractable models, estimating the evidence adds another layer of difficulty. Consequently, the selection of the model that best describes an observed network among a collection of exponential random graph models for network analysis is a daunting task. Pseudolikelihoods offer a tractable approximation to the likelihood but should be treated with caution because they can lead to an unreasonable inference. This paper specifies a method to adjust pseudolikelihoods in order to obtain a reasonable, yet tractable, approximation to the likelihood. This allows implementation of widely used computational methods for evidence estimation and pursuit of Bayesian model selection of exponential random graph models for the analysis of social networks. Empirical comparisons to existing methods show that our procedure yields similar evidence estimates, but at a lower computational cost.Comment: Supplementary material attached. To view attachments, please download and extract the gzzipped source file listed under "Other formats

arXiv.org e-Print Archive

Research Repository UCD

Irish Universities

FigShare