654,270 research outputs found

    Model-Based Bayesian Exploration

    Full text link
    Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.Comment: Appears in Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI1999

    Decision Making for Rapid Information Acquisition in the Reconnaissance of Random Fields

    Full text link
    Research into several aspects of robot-enabled reconnaissance of random fields is reported. The work has two major components: the underlying theory of information acquisition in the exploration of unknown fields and the results of experiments on how humans use sensor-equipped robots to perform a simulated reconnaissance exercise. The theoretical framework reported herein extends work on robotic exploration that has been reported by ourselves and others. Several new figures of merit for evaluating exploration strategies are proposed and compared. Using concepts from differential topology and information theory, we develop the theoretical foundation of search strategies aimed at rapid discovery of topological features (locations of critical points and critical level sets) of a priori unknown differentiable random fields. The theory enables study of efficient reconnaissance strategies in which the tradeoff between speed and accuracy can be understood. The proposed approach to rapid discovery of topological features has led in a natural way to to the creation of parsimonious reconnaissance routines that do not rely on any prior knowledge of the environment. The design of topology-guided search protocols uses a mathematical framework that quantifies the relationship between what is discovered and what remains to be discovered. The quantification rests on an information theory inspired model whose properties allow us to treat search as a problem in optimal information acquisition. A central theme in this approach is that "conservative" and "aggressive" search strategies can be precisely defined, and search decisions regarding "exploration" vs. "exploitation" choices are informed by the rate at which the information metric is changing.Comment: 34 pages, 20 figure

    rdf:SynopsViz - A Framework for Hierarchical Linked Data Visual Exploration and Analysis

    Full text link
    The purpose of data visualization is to offer intuitive ways for information perception and manipulation, especially for non-expert users. The Web of Data has realized the availability of a huge amount of datasets. However, the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse large datasets. In this paper, we present rdf:SynopsViz, a tool for hierarchical charting and visual exploration of Linked Open Data (LOD). Hierarchical LOD exploration is based on the creation of multiple levels of hierarchically related groups of resources based on the values of one or more properties. The adopted hierarchical model provides effective information abstraction and summarization. Also, it allows efficient -on the fly- statistic computations, using aggregations over the hierarchy levels.Comment: 11th Extended Semantic Web Conference (ESWC '14

    Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

    Full text link
    We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information. Neural-linear bandits leverage the representation power of deep neural networks and combine it with efficient exploration mechanisms, designed for linear contextual bandits, on top of the last hidden layer. Since the representation is being optimized during learning, information regarding exploration with "old" features is lost. Here, we propose the first limited memory neural-linear bandit that is resilient to this phenomenon, which we term catastrophic forgetting. We evaluate our method on a variety of real-world data sets, including regression, classification, and sentiment analysis, and observe that our algorithm is resilient to catastrophic forgetting and achieves superior performance

    Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems

    Get PDF
    Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an Information-cost Stochastic Nonlinear Optimal Control problem (Info-SNOC). The optimization objective encodes control cost for performance and exploration cost for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach

    The Journey is the Reward: Unsupervised Learning of Influential Trajectories

    Full text link
    Unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments. The information-theoretic principle of empowerment formalizes an unsupervised exploration objective through an agent trying to maximize its influence on the future states of its environment. Previous approaches carry certain limitations in that they either do not employ closed-loop feedback or do not have an internal state. As a consequence, a privileged final state is taken as an influence measure, rather than the full trajectory. We provide a model-free method which takes into account the whole trajectory while still offering the benefits of option-based approaches. We successfully apply our approach to settings with large action spaces, where discovery of meaningful action sequences is particularly difficult.Comment: ICML'19 ERL Worksho

    An Expectation Conditional Maximization approach for Gaussian graphical models

    Full text link
    Bayesian graphical models are a useful tool for understanding dependence relationships among many variables, particularly in situations with external prior information. In high-dimensional settings, the space of possible graphs becomes enormous, rendering even state-of-the-art Bayesian stochastic search computationally infeasible. We propose a deterministic alternative to estimate Gaussian and Gaussian copula graphical models using an Expectation Conditional Maximization (ECM) algorithm, extending the EM approach from Bayesian variable selection to graphical model estimation. We show that the ECM approach enables fast posterior exploration under a sequence of mixture priors, and can incorporate multiple sources of information

    Freshness-Aware Thompson Sampling

    Full text link
    To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv admin note: text overlap with arXiv:1409.772

    Faceted Search of Heterogeneous Geographic Information for Dynamic Map Projection

    Get PDF
    This paper proposes a faceted information exploration model that supports coarse-grained and fine-grained focusing of geographic maps by offering a graphical representation of data attributes within interactive widgets. The proposed approach enables (i) a multi-category projection of long-lasting geographic maps, based on the proposal of efficient facets for data exploration in sparse and noisy datasets, and (ii) an interactive representation of the search context based on widgets that support data visualization, faceted exploration, category-based information hiding and transparency of results at the same time. The integration of our model with a semantic representation of geographical knowledge supports the exploration of information retrieved from heterogeneous data sources, such as Public Open Data and OpenStreetMap. We evaluated our model with users in the OnToMap collaborative Web GIS. The experimental results show that, when working on geographic maps populated with multiple data categories, it outperforms simple category-based map projection and traditional faceted search tools, such as checkboxes, in both user performance and experience

    Signed Link Prediction with Sparse Data: The Role of Personality Information

    Full text link
    Predicting signed links in social networks often faces the problem of signed link data sparsity, i.e., only a small percentage of signed links are given. The problem is exacerbated when the number of negative links is much smaller than that of positive links. Boosting signed link prediction necessitates additional information to compensate for data sparsity. According to psychology theories, one rich source of such information is user's personality such as optimism and pessimism that can help determine her propensity in establishing positive and negative links. In this study, we investigate how personality information can be obtained, and if personality information can help alleviate the data sparsity problem for signed link prediction. We propose a novel signed link prediction model that enables empirical exploration of user personality via social media data. We evaluate our proposed model on two datasets of real-world signed link networks. The results demonstrate the complementary role of personality information in the signed link prediction problem. Experimental results also indicate the effectiveness of different levels of personality information for signed link data sparsity problem.Comment: Companion Proceedings of the 2019 World Wide Web Conferenc
    • …
    corecore