98 research outputs found
On Practical machine Learning and Data Analysis
This thesis discusses and addresses some of the difficulties
associated with practical machine learning and data
analysis. Introducing data driven methods in e.g industrial and
business applications can lead to large gains in productivity and
efficiency, but the cost and complexity are often
overwhelming. Creating machine learning applications in practise often
involves a large amount of manual labour, which often needs to be
performed by an experienced analyst without significant experience
with the application area. We will here discuss some of the hurdles
faced in a typical analysis project and suggest measures and methods
to simplify the process.
One of the most important issues when applying machine learning
methods to complex data, such as e.g. industrial applications, is that
the processes generating the data are modelled in an appropriate
way. Relevant aspects have to be formalised and represented in a way
that allow us to perform our calculations in an efficient manner. We
present a statistical modelling framework, Hierarchical Graph
Mixtures, based on a combination of graphical models and mixture
models. It allows us to create consistent, expressive statistical
models that simplify the modelling of complex systems. Using a
Bayesian approach, we allow for encoding of prior knowledge and make
the models applicable in situations when relatively little data are
available.
Detecting structures in data, such as clusters and dependency
structure, is very important both for understanding an application
area and for specifying the structure of e.g. a hierarchical graph
mixture. We will discuss how this structure can be extracted for
sequential data. By using the inherent dependency structure of
sequential data we construct an information theoretical measure of
correlation that does not suffer from the problems most common
correlation measures have with this type of data.
In many diagnosis situations it is desirable to perform a
classification in an iterative and interactive manner. The matter is
often complicated by very limited amounts of knowledge and examples
when a new system to be diagnosed is initially brought into use. We
describe how to create an incremental classification system based on a
statistical model that is trained from empirical data, and show how
the limited available background information can still be used
initially for a functioning diagnosis system.
To minimise the effort with which results are achieved within data
analysis projects, we need to address not only the models used, but
also the methodology and applications that can help simplify the
process. We present a methodology for data preparation and a software
library intended for rapid analysis, prototyping, and deployment.
Finally, we will study a few example applications, presenting tasks
within classification, prediction and anomaly detection. The examples
include demand prediction for supply chain management, approximating
complex simulators for increased speed in parameter optimisation, and
fraud detection and classification within a media-on-demand system
Runtime Monitoring for Uncertain Times
In Runtime Verification (RV), monitors check programs for correct operation at execution time. Also called Runtime Monitoring, RV offers advantages over other approaches to program verification. Efficient monitoring is possible for programs where static checking is cost-prohibitive. Runtime monitors may test for execution faults like hardware failure, as well as logical faults. Unlike simple log checking, monitors are typically constructed using formal languages and methods that precisely define expectations and guarantees. Despite the advantages of RV, however, adoption remains low.
Applying Runtime Monitoring techniques to real systems requires addressing practical concerns that have garnered little attention from researchers. System operators need monitors that provide immediate diagnostic information before and after failures, that are simple to operate over distributed systems, and that remain reliable when communication is not. These challenges are solvable, and solving them is a necessary step towards widespread RV deployment.
This thesis provides solutions to these and other barriers to practical Runtime Monitoring. We address the need for reporting diagnostic information from monitored programs with nfer, a language and system for event stream abstraction. Nfer supports the automatic extraction of the structure of real-time software and includes integrations with popular programming languages. We also provide for the operation of nfer and other monitoring tools over distributed systems with Palisade, a framework built for low-latency detection of embedded system anomalies. Finally, we supply a method to ensure program properties may be monitored despite unreliable communication channels. We classify monitorable properties over general unreliable conditions and define an algorithm for when more specific conditions are known
An Algorithmic Interpretation of Quantum Probability
The Everett (or relative-state, or many-worlds) interpretation of quantum mechanics has come under fire for inadequately dealing with the Born rule (the formula for calculating quantum probabilities). Numerous attempts have been made to derive this rule from the perspective of observers within the quantum wavefunction. These are not really analytic proofs, but are rather attempts to derive the Born rule as a synthetic a priori necessity, given the nature of human observers (a fact not fully appreciated even by all of those who have attempted such proofs). I show why existing attempts are unsuccessful or only partly successful, and postulate that Solomonoff's algorithmic approach to the interpretation of probability theory could clarify the problems with these approaches. The Sleeping Beauty probability puzzle is used as a springboard from which to deduce an objectivist, yet synthetic a priori framework for quantum probabilities, that properly frames the role of self-location and self-selection (anthropic) principles in probability theory. I call this framework "algorithmic synthetic unity" (or ASU). I offer no new formal proof of the Born rule, largely because I feel that existing proofs (particularly that of Gleason) are already adequate, and as close to being a formal proof as one should expect or want. Gleason's one unjustified assumption--known as noncontextuality--is, I will argue, completely benign when considered within the algorithmic framework that I propose. I will also argue that, to the extent the Born rule can be derived within ASU, there is no reason to suppose that we could not also derive all the other fundamental postulates of quantum theory, as well. There is nothing special here about the Born rule, and I suggest that a completely successful Born rule proof might only be possible once all the other postulates become part of the derivation. As a start towards this end, I show how we can already derive the essential content of the fundamental postulates of quantum mechanics, at least in outline, and especially if we allow some educated and well-motivated guesswork along the way. The result is some steps towards a coherent and consistent algorithmic interpretation of quantum mechanics
- …