116 research outputs found

    Efficient Communication via Reinforcement Learning

    Get PDF
    Why do languages partition mental concepts into words the way the do? Recent works have taken a information-theoretic view on human language and suggested that it is shaped by the need for efficient communication. This means that human language is shaped by a simultaneous pressure for being informative, while also being simple in order to minimize the cognitive load. In this thesis we combine the information-theoretic perspective on language with recent advances in deep multi-agent reinforcement learning. We explore how efficient communication emerges between two artificial agents in a signaling game as a by-product of them maximizing a shared reward signal. This is tested in the domain of colors and numeral systems, two domains in which human languages tends to support efficient communication. We find that the communication developed by the artificial agents in these domains shares characteristics with human languages when it comes to efficiency and structure of semantic partitions. even though the agents lack the full perceptual and linguistic architecture of humans.Our results offer a computational learning perspective that may complement the information-theoretic view on the structure of human languages. The results also suggests that reinforcement learning is a powerful and flexible framework that can be used to test and generate hypotheses in silico

    Learning Approximate and Exact Numeral Systems via Reinforcement Learning

    Get PDF
    Recent work (Xu et al., 2020) has suggested that numeral systems in different languages are shaped by a functional need for efficient communication in an information-theoretic sense. Here we take a learning-theoretic approach and show how efficient communication emerges via reinforcement learning. In our framework, two artificial agents play a Lewis signaling game where the goal is to convey a numeral concept. The agents gradually learn to communicate using reinforcement learning and the resulting numeral systems are shown to be efficient in the information-theoretic framework of Regier et al.(2015); Gibson et al. (2017). They are also shown to be similar to human numeral systems of same type. Our results thus provide a mechanistic explanation via reinforcement learning of the recent results in Xu et al. (2020) and can potentially be generalized to other semantic domains

    Thompson Sampling for Bandits with Clustered Arms

    Get PDF
    We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms

    Towards Learning Abstractions via Reinforcement Learning

    Full text link
    In this paper we take the first steps in studying a new approach to synthesis of efficient communication schemes in multi-agent systems, trained via reinforcement learning. We combine symbolic methods with machine learning, in what is referred to as a neuro-symbolic system. The agents are not restricted to only use initial primitives: reinforcement learning is interleaved with steps to extend the current language with novel higher-level concepts, allowing generalisation and more informative communication via shorter messages. We demonstrate that this approach allow agents to converge more quickly on a small collaborative construction task.Comment: AIC 2022, 8th International Workshop on Artificial Intelligence and Cognitio

    Pure Exploration in Bandits with Linear Constraints

    Full text link
    We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem

    A reinforcement-learning approach to efficient communication

    Get PDF
    We present a multi-agent computational approach to partitioning semantic spaces using reinforcement-learning (RL). Two agents communicate using a finite linguistic vocabulary in order to convey a concept. This is tested in the color domain, and a natural reinforcement learning mechanism is shown to converge to a scheme that achieves a near-optimal trade-off of simplicity versus communication efficiency. Results are presented both on the communication efficiency as well as on analyses of the resulting partitions of the color space. The effect of varying environmental factors such as noise is also studied. These results suggest that RL offers a powerful and flexible computational framework that can contribute to the development of communication schemes for color names that are near-optimal in an information-theoretic sense and may shape color-naming systems across languages. Our approach is not specific to color and can be used to explore cross-language variation in other semantic domains

    Sequence dependent task extensions for trip scheduling

    Get PDF
    A constraint model for scheduling train trips on a network of tracks used in both directions, using a headway abstraction is described. We argue that a generalisation of a straightforward job-shop scheduling formulation using sequence dependent task extensions can decrease the required resolution of network representation and hence problem size. A geometric interpretation of the model of the constraints that can be used to visualise schedules is presented. Preliminary ideas on search heuristics are presented with performance results and a set of examples

    Analysis of Wild Type and Variant B Cystatin C Interactome in Retinal Pigment Epithelium Cells Reveals Variant B Interacting Mitochondrial Proteins

    Get PDF
    Cystatin C, a secreted cysteine protease inhibitor, is abundantly expressed in retinal pigment epithelium (RPE) cells. A mutation in the protein's leader sequence, corresponding to formation of an alternate variant B protein, has been linked with an increased risk for both age-related macular degeneration (AMD) and Alzheimer's disease (AD). Variant B cystatin C displays intracellular mistrafficking with partial mitochondrial association. We hypothesized that variant B cystatin C interacts with mitochondrial proteins and impacts mitochondrial function. We sought to determine how the interactome of the disease-related variant B cystatin C differs from that of the wild-type (WT) form. For this purpose, we expressed cystatin C Halo-tag fusion constructs in RPE cells to pull down proteins interacting with either the WT or variant B form, followed by identification and quantification by mass spectrometry. We identified a total of 28 interacting proteins, of which 8 were exclusively pulled down by variant B cystatin C. These included 18 kDa translocator protein (TSPO) and cytochrome B5 type B, both of which are localized to the mitochondrial outer membrane. Variant B cystatin C expression also affected RPE mitochondrial function with increased membrane potential and susceptibility to damage-induced ROS production. The findings help us to understand how variant B cystatin C differs functionally from the WT form and provide leads to RPE processes adversely affected by the variant B genotype
    • …
    corecore