2 research outputs found

    Identification of Data Structure with Machine Learning: From Fisher to Bayesian networks

    Get PDF
    This thesis proposes a theoretical framework to thoroughly analyse the structure of a dataset in terms of a) metric, b) density and c) feature associations. To look into the first aspect, Fisher's metric learning algorithms are the foundations of a novel manifold based on the information and complexity of a classification model. When looking at the density aspect, the Probabilistic Quantum clustering, a Bayesian version of the original Quantum Clustering is proposed. The clustering results will depend on local density variations, which is a desired feature when dealing with heteroscedastic data. To address the third aspect, the constraint-based PC-algorithm is the starting point of many structure learning algorithms, it is focused on finding feature associations by means of conditional independent tests. This is then used to select Bayesian networks, based on a regularized likelihood score. These three topics of data structure analysis were fully tested with synthetic data examples and real cases, which allowed us to unravel and discuss the advantages and limitations of these algorithms. One of the biggest challenges encountered was related to the application of these methods to a Big Data dataset that was analysed within the framework of a collaboration with a large UK retailer, where the interest was in the identification of the data structure underlying customer shopping baskets

    Artificial Intelligence for Detecting Preterm Uterine Activity in Gynacology and Obstertric Care

    Get PDF
    Preterm birth brings considerable emotional and economic costs to families and society. However, despite extensive research into understanding the risk factors, the prediction of patient mechanisms and improvements to obstetrical practice, the UK National Health Service still annually spends more than £2.95 billion on this issue. Diagnosis of labour in normal pregnancies is important for minimizing unnecessary hospitalisations, interventions and expenses. Moreover, accurate identification of spontaneous preterm labour would also allow clinicians to start necessary treatments early in women with true labour and avert unnecessary treatment and hospitalisation for women who are simply having preterm contractions, but who are not in true labour. In this research, the Electrohysterography signals have been used to detect preterm births, because Electrohysterography signals provide a strong basis for objective prediction and diagnosis of preterm birth. This has been achieved using an open dataset, which contains 262 records for women who delivered at term and 38 who delivered prematurely. Three different machine learning algorithm were used to identify these records. The results illustrate that the Random Forest performed the best of sensitivity 97%, specificity of 85%, Area under the Receiver Operator curve (AUROC) of 94% and mean square error rate of 14%
    corecore