1,433 research outputs found

    Data-efficient machine learning for design and optimisation of complex systems

    Get PDF

    A sparse multinomial probit model for classification

    No full text
    A recent development in penalized probit modelling using a hierarchical Bayesian approach has led to a sparse binomial (two-class) probit classifier that can be trained via an EM algorithm. A key advantage of the formulation is that no tuning of hyperparameters relating to the penalty is needed thus simplifying the model selection process. The resulting model demonstrates excellent classification performance and a high degree of sparsity when used as a kernel machine. It is, however, restricted to the binary classification problem and can only be used in the multinomial situation via a one-against-all or one-against-many strategy. To overcome this, we apply the idea to the multinomial probit model. This leads to a direct multi-classification approach and is shown to give a sparse solution with accuracy and sparsity comparable with the current state-of-the-art. Comparative numerical benchmark examples are used to demonstrate the method

    Inferential stability in systems biology

    Get PDF
    The modern biological sciences are fraught with statistical difficulties. Biomolecular stochasticity, experimental noise, and the “large p, small n” problem all contribute to the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful conclusions from observations. In this thesis, we explore methods for assessing the effects of data variability upon downstream inference, in an attempt to quantify and promote the stability of the inferences we make. We start with a review of existing methods for addressing this problem, focusing upon the bootstrap and similar methods. The key requirement for all such approaches is a statistical model that approximates the data generating process. We move on to consider biomarker discovery problems. We present a novel algorithm for proposing putative biomarkers on the strength of both their predictive ability and the stability with which they are selected. In a simulation study, we find our approach to perform favourably in comparison to strategies that select on the basis of predictive performance alone. We then consider the real problem of identifying protein peak biomarkers for HAM/TSP, an inflammatory condition of the central nervous system caused by HTLV-1 infection. We apply our algorithm to a set of SELDI mass spectral data, and identify a number of putative biomarkers. Additional experimental work, together with known results from the literature, provides corroborating evidence for the validity of these putative biomarkers. Having focused on static observations, we then make the natural progression to time course data sets. We propose a (Bayesian) bootstrap approach for such data, and then apply our method in the context of gene network inference and the estimation of parameters in ordinary differential equation models. We find that the inferred gene networks are relatively unstable, and demonstrate the importance of finding distributions of ODE parameter estimates, rather than single point estimates

    Current overview and way forward for the use of machine learning in the field of petroleum gas hydrates

    Get PDF
    Gas hydrates represent one of the main flow assurance challenges in the oil and gas industry as they can lead to plugging of pipelines and process equipment. In this paper we present a literature study performed to evaluate the current state of the use of machine learning methods within the field of gas hydrates with specific focus on the oil chemistry. A common analysis technique for crude oils is Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) which could be a good approach to achieving a better understanding of the chemical composition of hydrates, and the use of machine learning in the field of FT-ICR MS was therefore also examined. Several machine learning methods were identified as promising, their use in the literature was reviewed and a text analysis study was performed to identify the main topics within the publications. The literature search revealed that the publications on the combination of FT-ICR MS, machine learning and gas hydrates is limited to one. Most of the work on gas hydrates is related to thermodynamics, while FT-ICR MS is mostly used for chemical analysis of oils. However, with the combination of FT-ICR MS and machine learning to evaluate samples related to gas hydrates, it could be possible to improve the understanding of the composition of hydrates and thereby identify hydrate active compounds responsible for the differences between oils forming plugging hydrates and oils forming transportable hydrates.Current overview and way forward for the use of machine learning in the field of petroleum gas hydratespublishedVersio

    Automatic Identification of Different Types of Consumer Configurations by Using Harmonic Current Measurements

    Get PDF
    Power quality (PQ) is an increasing concern in the distribution networks of modern industrialized countries. The PQ monitoring activities of distribution system operators (DSO), and consequently the amount of PQ measurement data, continuously increase, and consequently new and automated tools are required for efficient PQ analysis. Time characteristics of PQ parameters (e.g., harmonics) usually show characteristic daily and weekly cycles, mainly caused by the usage behaviour of electric devices. In this paper, methods are proposed for the classification of harmonic emission profiles for typical consumer configurations in public low voltage (LV) networks using a binary decision tree in combination with support vector machines. The performance of the classification was evaluated based on 40 different measurement sites in German public LV grids. Thismethod can support network operators in the identification of consumer configurations and the early detection of fundamental changes in harmonic emission behaviour. This enables, for example, assistance in resolving customer complaints or supporting network planning by managing PQ levels using typical harmonic emission profiles

    Line-Field Based Adaptive Image Model for Blind Deblurring

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore