3,084 research outputs found
On Practical machine Learning and Data Analysis
This thesis discusses and addresses some of the difficulties
associated with practical machine learning and data
analysis. Introducing data driven methods in e.g industrial and
business applications can lead to large gains in productivity and
efficiency, but the cost and complexity are often
overwhelming. Creating machine learning applications in practise often
involves a large amount of manual labour, which often needs to be
performed by an experienced analyst without significant experience
with the application area. We will here discuss some of the hurdles
faced in a typical analysis project and suggest measures and methods
to simplify the process.
One of the most important issues when applying machine learning
methods to complex data, such as e.g. industrial applications, is that
the processes generating the data are modelled in an appropriate
way. Relevant aspects have to be formalised and represented in a way
that allow us to perform our calculations in an efficient manner. We
present a statistical modelling framework, Hierarchical Graph
Mixtures, based on a combination of graphical models and mixture
models. It allows us to create consistent, expressive statistical
models that simplify the modelling of complex systems. Using a
Bayesian approach, we allow for encoding of prior knowledge and make
the models applicable in situations when relatively little data are
available.
Detecting structures in data, such as clusters and dependency
structure, is very important both for understanding an application
area and for specifying the structure of e.g. a hierarchical graph
mixture. We will discuss how this structure can be extracted for
sequential data. By using the inherent dependency structure of
sequential data we construct an information theoretical measure of
correlation that does not suffer from the problems most common
correlation measures have with this type of data.
In many diagnosis situations it is desirable to perform a
classification in an iterative and interactive manner. The matter is
often complicated by very limited amounts of knowledge and examples
when a new system to be diagnosed is initially brought into use. We
describe how to create an incremental classification system based on a
statistical model that is trained from empirical data, and show how
the limited available background information can still be used
initially for a functioning diagnosis system.
To minimise the effort with which results are achieved within data
analysis projects, we need to address not only the models used, but
also the methodology and applications that can help simplify the
process. We present a methodology for data preparation and a software
library intended for rapid analysis, prototyping, and deployment.
Finally, we will study a few example applications, presenting tasks
within classification, prediction and anomaly detection. The examples
include demand prediction for supply chain management, approximating
complex simulators for increased speed in parameter optimisation, and
fraud detection and classification within a media-on-demand system
A Survey of Bayesian Statistical Approaches for Big Data
The modern era is characterised as an era of information or Big Data. This
has motivated a huge literature on new methods for extracting information and
insights from these data. A natural question is how these approaches differ
from those that were available prior to the advent of Big Data. We present a
review of published studies that present Bayesian statistical approaches
specifically for Big Data and discuss the reported and perceived benefits of
these approaches. We conclude by addressing the question of whether focusing
only on improving computational algorithms and infrastructure will be enough to
face the challenges of Big Data
- …