654,270 research outputs found
Model-Based Bayesian Exploration
Reinforcement learning systems are often concerned with balancing exploration
of untested actions against exploitation of actions that are known to be good.
The benefit of exploration can be estimated using the classical notion of Value
of Information - the expected improvement in future decision quality arising
from the information acquired by exploration. Estimating this quantity requires
an assessment of the agent's uncertainty about its current value estimates for
states. In this paper we investigate ways of representing and reasoning about
this uncertainty in algorithms where the system attempts to learn a model of
its environment. We explicitly represent uncertainty about the parameters of
the model and build probability distributions over Q-values based on these.
These distributions are used to compute a myopic approximation to the value of
information for each action and hence to select the action that best balances
exploration and exploitation.Comment: Appears in Proceedings of the Fifteenth Conference on Uncertainty in
Artificial Intelligence (UAI1999
Decision Making for Rapid Information Acquisition in the Reconnaissance of Random Fields
Research into several aspects of robot-enabled reconnaissance of random
fields is reported. The work has two major components: the underlying theory of
information acquisition in the exploration of unknown fields and the results of
experiments on how humans use sensor-equipped robots to perform a simulated
reconnaissance exercise.
The theoretical framework reported herein extends work on robotic exploration
that has been reported by ourselves and others. Several new figures of merit
for evaluating exploration strategies are proposed and compared. Using concepts
from differential topology and information theory, we develop the theoretical
foundation of search strategies aimed at rapid discovery of topological
features (locations of critical points and critical level sets) of a priori
unknown differentiable random fields. The theory enables study of efficient
reconnaissance strategies in which the tradeoff between speed and accuracy can
be understood. The proposed approach to rapid discovery of topological features
has led in a natural way to to the creation of parsimonious reconnaissance
routines that do not rely on any prior knowledge of the environment. The design
of topology-guided search protocols uses a mathematical framework that
quantifies the relationship between what is discovered and what remains to be
discovered. The quantification rests on an information theory inspired model
whose properties allow us to treat search as a problem in optimal information
acquisition. A central theme in this approach is that "conservative" and
"aggressive" search strategies can be precisely defined, and search decisions
regarding "exploration" vs. "exploitation" choices are informed by the rate at
which the information metric is changing.Comment: 34 pages, 20 figure
rdf:SynopsViz - A Framework for Hierarchical Linked Data Visual Exploration and Analysis
The purpose of data visualization is to offer intuitive ways for information
perception and manipulation, especially for non-expert users. The Web of Data
has realized the availability of a huge amount of datasets. However, the volume
and heterogeneity of available information make it difficult for humans to
manually explore and analyse large datasets. In this paper, we present
rdf:SynopsViz, a tool for hierarchical charting and visual exploration of
Linked Open Data (LOD). Hierarchical LOD exploration is based on the creation
of multiple levels of hierarchically related groups of resources based on the
values of one or more properties. The adopted hierarchical model provides
effective information abstraction and summarization. Also, it allows efficient
-on the fly- statistic computations, using aggregations over the hierarchy
levels.Comment: 11th Extended Semantic Web Conference (ESWC '14
Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching
We study the neural-linear bandit model for solving sequential
decision-making problems with high dimensional side information. Neural-linear
bandits leverage the representation power of deep neural networks and combine
it with efficient exploration mechanisms, designed for linear contextual
bandits, on top of the last hidden layer. Since the representation is being
optimized during learning, information regarding exploration with "old"
features is lost. Here, we propose the first limited memory neural-linear
bandit that is resilient to this phenomenon, which we term catastrophic
forgetting. We evaluate our method on a variety of real-world data sets,
including regression, classification, and sentiment analysis, and observe that
our algorithm is resilient to catastrophic forgetting and achieves superior
performance
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an Information-cost Stochastic Nonlinear Optimal Control problem (Info-SNOC). The optimization objective encodes control cost for performance and exploration cost for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach
The Journey is the Reward: Unsupervised Learning of Influential Trajectories
Unsupervised exploration and representation learning become increasingly
important when learning in diverse and sparse environments. The
information-theoretic principle of empowerment formalizes an unsupervised
exploration objective through an agent trying to maximize its influence on the
future states of its environment. Previous approaches carry certain limitations
in that they either do not employ closed-loop feedback or do not have an
internal state. As a consequence, a privileged final state is taken as an
influence measure, rather than the full trajectory. We provide a model-free
method which takes into account the whole trajectory while still offering the
benefits of option-based approaches. We successfully apply our approach to
settings with large action spaces, where discovery of meaningful action
sequences is particularly difficult.Comment: ICML'19 ERL Worksho
An Expectation Conditional Maximization approach for Gaussian graphical models
Bayesian graphical models are a useful tool for understanding dependence
relationships among many variables, particularly in situations with external
prior information. In high-dimensional settings, the space of possible graphs
becomes enormous, rendering even state-of-the-art Bayesian stochastic search
computationally infeasible. We propose a deterministic alternative to estimate
Gaussian and Gaussian copula graphical models using an Expectation Conditional
Maximization (ECM) algorithm, extending the EM approach from Bayesian variable
selection to graphical model estimation. We show that the ECM approach enables
fast posterior exploration under a sequence of mixture priors, and can
incorporate multiple sources of information
Freshness-Aware Thompson Sampling
To follow the dynamicity of the user's content, researchers have recently
started to model interactions between users and the Context-Aware Recommender
Systems (CARS) as a bandit problem where the system needs to deal with
exploration and exploitation dilemma. In this sense, we propose to study the
freshness of the user's content in CARS through the bandit problem. We
introduce in this paper an algorithm named Freshness-Aware Thompson Sampling
(FA-TS) that manages the recommendation of fresh document according to the
user's risk of the situation. The intensive evaluation and the detailed
analysis of the experimental results reveals several important discoveries in
the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv
admin note: text overlap with arXiv:1409.772
Faceted Search of Heterogeneous Geographic Information for Dynamic Map Projection
This paper proposes a faceted information exploration model that supports
coarse-grained and fine-grained focusing of geographic maps by offering a
graphical representation of data attributes within interactive widgets. The
proposed approach enables (i) a multi-category projection of long-lasting
geographic maps, based on the proposal of efficient facets for data exploration
in sparse and noisy datasets, and (ii) an interactive representation of the
search context based on widgets that support data visualization, faceted
exploration, category-based information hiding and transparency of results at
the same time. The integration of our model with a semantic representation of
geographical knowledge supports the exploration of information retrieved from
heterogeneous data sources, such as Public Open Data and OpenStreetMap. We
evaluated our model with users in the OnToMap collaborative Web GIS. The
experimental results show that, when working on geographic maps populated with
multiple data categories, it outperforms simple category-based map projection
and traditional faceted search tools, such as checkboxes, in both user
performance and experience
Signed Link Prediction with Sparse Data: The Role of Personality Information
Predicting signed links in social networks often faces the problem of signed
link data sparsity, i.e., only a small percentage of signed links are given.
The problem is exacerbated when the number of negative links is much smaller
than that of positive links. Boosting signed link prediction necessitates
additional information to compensate for data sparsity. According to psychology
theories, one rich source of such information is user's personality such as
optimism and pessimism that can help determine her propensity in establishing
positive and negative links. In this study, we investigate how personality
information can be obtained, and if personality information can help alleviate
the data sparsity problem for signed link prediction. We propose a novel signed
link prediction model that enables empirical exploration of user personality
via social media data. We evaluate our proposed model on two datasets of
real-world signed link networks. The results demonstrate the complementary role
of personality information in the signed link prediction problem. Experimental
results also indicate the effectiveness of different levels of personality
information for signed link data sparsity problem.Comment: Companion Proceedings of the 2019 World Wide Web Conferenc
- …