1,150 research outputs found
Feature Reinforcement Learning: Part I: Unstructured MDPs
General-purpose, intelligent, learning agents cycle through sequences of
observations, actions, and rewards that are complex, uncertain, unknown, and
non-Markovian. On the other hand, reinforcement learning is well-developed for
small finite state Markov decision processes (MDPs). Up to now, extracting the
right state representations out of bare observations, that is, reducing the
general agent setup to the MDP framework, is an art that involves significant
effort by designers. The primary goal of this work is to automate the reduction
process and thereby significantly expand the scope of many existing
reinforcement learning algorithms and the agents that employ them. Before we
can think of mechanizing this search for suitable MDPs, we need a formal
objective criterion. The main contribution of this article is to develop such a
criterion. I also integrate the various parts into one learning algorithm.
Extensions to more realistic dynamic Bayesian networks are developed in Part
II. The role of POMDPs is also considered there.Comment: 24 LaTeX pages, 5 diagram
Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams
Data Mining â known as the process of extracting knowledge from massive data sets â leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes.
However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream.
While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. Nevertheless, extracting knowledge in this setting is advantageous for many industrial applications: identifying patterns from high-dimensional data streams in real-time may lead to larger production volumes, or reduce operational costs. The goal of this dissertation is to bridge this gap.
We first focus on dependency estimation, a fundamental task of Data Mining. Typically, one estimates dependency by quantifying the strength of statistical relationships. We identify the requirements for dependency estimation in high-dimensional data streams and propose a new estimation framework, Monte Carlo Dependency Estimation (MCDE), that fulfils them all. We show that MCDE leads to efficient dependency monitoring.
Then, we generalise the task of monitoring by introducing the Scaling Multi-Armed Bandit (S-MAB) algorithms, extending the Multi-Armed Bandit (MAB) model. We show that our algorithms can efficiently monitor statistics by leveraging user-specific criteria.
Finally, we describe applications of our contributions to Knowledge Discovery. We propose an algorithm, Streaming Greedy Maximum Random Deviation (SGMRD), which exploits our new methods to extract patterns, e.g., outliers, in high-dimensional data streams. Also, we present a new approach, that we name kj-Nearest Neighbours (kj-NN), to detect outlying documents within massive text corpora.
We support our algorithmic contributions with theoretical guarantees, as well as extensive experiments against both synthetic and real-world data. We demonstrate the benefits of our methods against real-world use cases. Overall, this dissertation establishes fundamental tools for Knowledge Discovery in high-dimensional data streams, which help with many applications in the industry, e.g., anomaly detection, or predictive maintenance.
To facilitate the application of our results and future research, we publicly release our implementations, experiments, and benchmark data via open-source platforms
Putting bandits into context: How function learning supports decision making
The authors introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximize their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the functional relation between contexts and rewards and improve their decision strategy over time. In three experiments, the authors explore participantsâ behavior in such learning environments. They predict participantsâ behavior by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process and linear regression) learning approaches combined with different choice strategies. Participants are mostly able to learn about the context-reward functions and their behavior is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances. In a relatively simple task with binary features, they seem to combine this learning with a probability of improvement decision strategy which focuses on alternatives that are expected to lead to an improvement upon a current favorite option. In a task with continuous features that are linearly related to the rewards, participants seem to more explicitly balance exploration and exploitation. Finally, in a difficult learning environment where the relation between features and rewards is nonlinear, some participants are again well-described by a Gaussian process learning strategy, whereas others revert to context-blind strategies
The Role of the Noradrenergic System in the ExplorationâExploitation Trade-Off: A Psychopharmacological Study
Animal research and computational modeling have indicated an important role for the neuromodulatory locus coeruleusânorepinephrine (LCâNE) system in the control of behavior. According to the adaptive gain theory, the LCâNE system is critical for optimizing behavioral performance by regulating the balance between exploitative and exploratory control states. However, crucial direct empirical tests of this theory in human subjects have been lacking. We used a pharmacological manipulation of the LCâNE system to test predictions of this theory in humans. In a double-blind parallel-groups design (Nâ=â52), participants received 4âmg reboxetine (a selective norepinephrine reuptake inhibitor), 30âmg citalopram (a selective serotonin reuptake inhibitor), or placebo. The adaptive gain theory predicted that the increased tonic NE levels induced by reboxetine would promote task disengagement and exploratory behavior. We assessed the effects of reboxetine on performance in two cognitive tasks designed to examine task (dis)engagement and exploitative versus exploratory behavior: a diminishing-utility task and a gambling task with a non-stationary pay-off structure. In contrast to predictions of the adaptive gain theory, we did not find differences in task (dis)engagement or exploratory behavior between the three experimental groups, despite demonstrable effects of the two drugs on non-specific central and autonomic nervous system parameters. Our findings suggest that the LCâNE system may not be involved in the regulation of the explorationâexploitation trade-off in humans, at least not within the context of a single task. It remains to be examined whether the LCâNE system is involved in random exploration exceeding the current task context
Why we need biased AI -- How including cognitive and ethical machine biases can enhance AI systems
This paper stresses the importance of biases in the field of artificial
intelligence (AI) in two regards. First, in order to foster efficient
algorithmic decision-making in complex, unstable, and uncertain real-world
environments, we argue for the structurewise implementation of human cognitive
biases in learning algorithms. Secondly, we argue that in order to achieve
ethical machine behavior, filter mechanisms have to be applied for selecting
biased training stimuli that represent social or behavioral traits that are
ethically desirable. We use insights from cognitive science as well as ethics
and apply them to the AI field, combining theoretical considerations with seven
case studies depicting tangible bias implementation scenarios. Ultimately, this
paper is the first tentative step to explicitly pursue the idea of a
re-evaluation of the ethical significance of machine biases, as well as putting
the idea forth to implement cognitive biases into machines
- âŠ