6,812 research outputs found
Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments
The feasibility of deep neural networks (DNNs) to address data stream
problems still requires intensive study because of the static and offline
nature of conventional deep learning approaches. A deep continual learning
algorithm, namely autonomous deep learning (ADL), is proposed in this paper.
Unlike traditional deep learning methods, ADL features a flexible structure
where its network structure can be constructed from scratch with the absence of
an initial network structure via the self-constructing network structure. ADL
specifically addresses catastrophic forgetting by having a different-depth
structure which is capable of achieving a trade-off between plasticity and
stability. Network significance (NS) formula is proposed to drive the hidden
nodes growing and pruning mechanism. Drift detection scenario (DDS) is put
forward to signal distributional changes in data streams which induce the
creation of a new hidden layer. The maximum information compression index
(MICI) method plays an important role as a complexity reduction module
eliminating redundant layers. The efficacy of ADL is numerically validated
under the prequential test-then-train procedure in lifelong environments using
nine popular data stream problems. The numerical results demonstrate that ADL
consistently outperforms recent continual learning methods while characterizing
the automatic construction of network structures
Discrete MDL Predicts in Total Variation
The Minimum Description Length (MDL) principle selects the model that has the
shortest code for data plus model. We show that for a countable class of
models, MDL predictions are close to the true distribution in a strong sense.
The result is completely general. No independence, ergodicity, stationarity,
identifiability, or other assumption on the model class need to be made. More
formally, we show that for any countable class of models, the distributions
selected by MDL (or MAP) asymptotically predict (merge with) the true measure
in the class in total variation distance. Implications for non-i.i.d. domains
like time-series forecasting, discriminative learning, and reinforcement
learning are discussed.Comment: 15 LaTeX page
Learning from Data Streams with Randomized Forests
Non-stationary streaming data poses a familiar challenge in machine learning: the need to
obtain fast and accurate predictions. A data stream is a continuously generated sequence of
data, with data typically arriving rapidly. They are often characterised by a non-stationary
generative process, with concept drift occurring as the process changes. Such processes are
commonly seen in the real world, such as in advertising, shopping trends, environmental
conditions, electricity monitoring and traffic monitoring.
Typical stationary algorithms are ill-suited for use with concept drifting data, thus necessitating
more targeted methods. Tree-based methods are a popular approach to this problem,
traditionally focussing on the use of the Hoeffding bound in order to guarantee performance
relative to a stationary scenario. However, there are limited single learners available for
regression scenarios, and those that do exist often struggle to choose between similarly
discriminative splits, leading to longer training times and worse performance. This limited
pool of single learners in turn hampers the performance of ensemble approaches in which
they act as base learners.
In this thesis we seek to remedy this gap in the literature, developing methods which
focus on increasing randomization to both improve predictive performance and reduce the
training times of tree-based ensemble methods. In particular, we have chosen to investigate
the use of randomization as it is known to be able to improve generalization error in
ensembles, and is also expected to lead to fast training times, thus being a natural method
of handling the problems typically experienced by single learners.
We begin in a regression scenario, introducing the Adaptive Trees for Streaming with
Extreme Randomization (ATSER) algorithm; a partially randomized approach based on
the concept of Extremely Randomized (extra) trees. The ATSER algorithm incrementally
trains trees, using the Hoeffding bound to select the best of a random selection of splits.
Simultaneously, the trees also detect and adapt to changes in the data stream. Unlike many
traditional streaming algorithms ATSER trees can easily be extended to include nominal
features. We find that compared to other contemporary methods ensembles of ATSER
trees lead to improved predictive performance whilst also reducing run times.
We then demonstrate the Adaptive Categorisation Trees for Streaming with Extreme
Randomization (ACTSER) algorithm, an adaption of the ATSER algorithm to the more
traditional categorization scenario, again showing improved predictive performance and
reduced runtimes. The inclusion of nominal features is particularly novel in this setting
since typical categorization approaches struggle to handle them.
Finally we examine a completely randomized scenario, where an ensemble of trees is generated
prior to having access to the data stream, while also considering multivariate splits
in addition to the traditional axis-aligned approach. We find that through the combination
of a forgetting mechanism in linear models and dynamic weighting for ensemble members,
we are able to avoid explicitly testing for concept drift. This leads to fast ensembles
with strong predictive performance, whilst also requiring fewer parameters than other
contemporary methods.
For each of the proposed methods in this thesis, we demonstrate empirically that they are
effective over a variety of different non-stationary data streams, including on multiple
types of concept drift. Furthermore, in comparison to other contemporary data streaming
algorithms, we find the biggest improvements in performance are on noisy data streams.Engineers Gat
Bitcoin Volatility Forecasting with a Glimpse into Buy and Sell Orders
In this paper, we study the ability to make the short-term prediction of the
exchange price fluctuations towards the United States dollar for the Bitcoin
market. We use the data of realized volatility collected from one of the
largest Bitcoin digital trading offices in 2016 and 2017 as well as order
information. Experiments are performed to evaluate a variety of statistical and
machine learning approaches.Comment: Full version of the paper published at IEEE International Conference
on Data Mining (ICDM), 201
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
- …