1,433 research outputs found
A sparse multinomial probit model for classification
A recent development in penalized probit modelling using a hierarchical Bayesian approach has led to a sparse binomial (two-class) probit classifier that can be trained via an EM algorithm. A key advantage of the formulation is that no tuning of hyperparameters relating to the penalty is needed thus simplifying the model selection process. The resulting model demonstrates excellent classification performance and a high degree of sparsity when used as a kernel machine. It is, however, restricted to the binary classification problem and can only be used in the multinomial situation via a one-against-all or one-against-many strategy. To overcome this, we apply the idea to the multinomial probit model. This leads to a direct multi-classification approach and is shown to give a sparse solution with accuracy and sparsity comparable with the current state-of-the-art. Comparative numerical benchmark examples are used to demonstrate the method
Inferential stability in systems biology
The modern biological sciences are fraught with statistical difficulties. Biomolecular
stochasticity, experimental noise, and the “large p, small n” problem all contribute to
the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful
conclusions from observations. In this thesis, we explore methods for assessing
the effects of data variability upon downstream inference, in an attempt to quantify and
promote the stability of the inferences we make.
We start with a review of existing methods for addressing this problem, focusing upon the
bootstrap and similar methods. The key requirement for all such approaches is a statistical
model that approximates the data generating process.
We move on to consider biomarker discovery problems. We present a novel algorithm for
proposing putative biomarkers on the strength of both their predictive ability and the stability
with which they are selected. In a simulation study, we find our approach to perform
favourably in comparison to strategies that select on the basis of predictive performance
alone.
We then consider the real problem of identifying protein peak biomarkers for HAM/TSP,
an inflammatory condition of the central nervous system caused by HTLV-1 infection.
We apply our algorithm to a set of SELDI mass spectral data, and identify a number of
putative biomarkers. Additional experimental work, together with known results from the
literature, provides corroborating evidence for the validity of these putative biomarkers.
Having focused on static observations, we then make the natural progression to time
course data sets. We propose a (Bayesian) bootstrap approach for such data, and then
apply our method in the context of gene network inference and the estimation of parameters
in ordinary differential equation models. We find that the inferred gene networks
are relatively unstable, and demonstrate the importance of finding distributions of ODE
parameter estimates, rather than single point estimates
Current overview and way forward for the use of machine learning in the field of petroleum gas hydrates
Gas hydrates represent one of the main flow assurance challenges in the oil and gas industry as they can lead to plugging of pipelines and process equipment. In this paper we present a literature study performed to evaluate the current state of the use of machine learning methods within the field of gas hydrates with specific focus on the oil chemistry. A common analysis technique for crude oils is Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) which could be a good approach to achieving a better understanding of the chemical composition of hydrates, and the use of machine learning in the field of FT-ICR MS was therefore also examined. Several machine learning methods were identified as promising, their use in the literature was reviewed and a text analysis study was performed to identify the main topics within the publications. The literature search revealed that the publications on the combination of FT-ICR MS, machine learning and gas hydrates is limited to one. Most of the work on gas hydrates is related to thermodynamics, while FT-ICR MS is mostly used for chemical analysis of oils. However, with the combination of FT-ICR MS and machine learning to evaluate samples related to gas hydrates, it could be possible to improve the understanding of the composition of hydrates and thereby identify hydrate active compounds responsible for the differences between oils forming plugging hydrates and oils forming transportable hydrates.Current overview and way forward for the use of machine learning in the field of petroleum gas hydratespublishedVersio
Automatic Identification of Different Types of Consumer Configurations by Using Harmonic Current Measurements
Power quality (PQ) is an increasing concern in the distribution networks of modern industrialized countries. The PQ monitoring activities of distribution system operators (DSO), and consequently the amount of PQ measurement data, continuously increase, and consequently new and automated tools are required for efficient PQ analysis. Time characteristics of PQ parameters (e.g., harmonics) usually show characteristic daily and weekly cycles, mainly caused by the usage behaviour of electric devices. In this paper, methods are proposed for the classification of harmonic emission profiles for typical consumer configurations in public low voltage (LV) networks using a binary decision tree in combination with support vector machines. The performance of the classification was evaluated based on 40 different measurement sites in German public LV grids. Thismethod can support network operators in the identification of consumer configurations and the early detection of fundamental changes in harmonic emission behaviour. This enables, for example, assistance in resolving customer complaints or supporting network planning by managing PQ levels using typical harmonic emission profiles
- …