2,263 research outputs found

    Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

    Full text link
    Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A well-studied LCS architecture is XCSF, which in the RL setting acts as a Q-function approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction - GNMC) and study its operation on XCSF's trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a significant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive

    Efficient concept formation in large state spaces

    Get PDF
    General autonomous agents must be able to operate in previously unseen worlds with large state spaces. To operate successfully in such worlds, the agents must maintain their own models of the environment, based on concept sets that are several orders of magnitude smaller. For adaptive agents, those concept sets cannot be fixed, but must adapt continuously to new situations. This, in turn, requires mechanisms for forming and preserving those concepts that are critical to successful decision-making, while removing others. In this paper we compare four general algorithms for learning and decision-making: (i) standard Q-learning, (ii) deep Q-learning, (iii) single-agent local Q-learning, and (iv) single-agent local Q-learning with improved concept formation rules. In an experiment with a state space larger than 232, it was found that a single-agent local Q-learning agent with improved concept formation rules performed substantially better than a similar agent with less sophisticated concept formation rules and slightly better than a deep Q-learning agent

    Context-aware Approach for Determining the Threshold Price in Name-Your-Own-Price Channels

    Get PDF
    Key feature of a context-aware application is the ability to adapt based on the change of context. Two approaches that are widely used in this regard are the context-action pair mapping where developers match an action to execute for a particular context change and the adaptive learning where a context-aware application refines its action over time based on the preceding action’s outcome. Both these approaches have limitation which makes them unsuitable in situations where a context-aware application has to deal with unknown context changes. In this paper we propose a framework where adaptation is carried out via concurrent multi-action evaluation of a dynamically created action space. This dynamic creation of the action space eliminates the need for relying on the developers to create context-action pairs and the concurrent multi-action evaluation reduces the adaptation time as opposed to the iterative approach used by adaptive learning techniques. Using our reference implementation of the framework we show how it could be used to dynamically determine the threshold price in an e-commerce system which uses the name-your-own-price (NYOP) strategy

    A Cognitive Architecture Based on a Learning Classifier System with Spiking Classifiers

    Get PDF
    © 2015, Springer Science+Business Media New York. Learning classifier systems (LCS) are population-based reinforcement learners that were originally designed to model various cognitive phenomena. This paper presents an explicitly cognitive LCS by using spiking neural networks as classifiers, providing each classifier with a measure of temporal dynamism. We employ a constructivist model of growth of both neurons and synaptic connections, which permits a genetic algorithm to automatically evolve sufficiently-complex neural structures. The spiking classifiers are coupled with a temporally-sensitive reinforcement learning algorithm, which allows the system to perform temporal state decomposition by appropriately rewarding “macro-actions”, created by chaining together multiple atomic actions. The combination of temporal reinforcement learning and neural information processing is shown to outperform benchmark neural classifier systems, and successfully solve a robotic navigation task

    Creating Creative Technologists: playing with(in) education

    Get PDF
    Since the industrial revolution, the organization of knowledge into distinct scientific, technical or creative categories has resulted in educational systems designed to produce and validate particular occupations. The methods by which students are exposed to different kinds of knowledge are critical in creating and reproducing individual, professional or cultural identities. (“I am an Engineer. You are an Artist”). The emergence of more open, creative and socialised technologies generates challenges for discipline-based education. At the same time, the term “Creative Technologies” also suggests a new occupational category (“I am a Creative Technologist”). This chapter presents a case-study of an evolving ‘anti-disciplinary’ project-based degree that challenges traditional degree structures to stimulate new forms of connective, imaginative and explorative learning, and to equip students to respond to a changing world. Learning is conceived as an emergent process; self-managed by students through critique and open peer review. We focus on ‘playfulness’ as a methodology for achieving multi-modal learning across the boundaries of art, design, computer science, engineering, games and entrepreneurship. In this new cultural moment, playfulness also re-frames the institutional identities of teacher and learner in response to new expectations for learning

    In Vitro Pharmacological Characterization of RXFP3 Allosterism: An Example of Probe Dependency

    Get PDF
    Recent findings suggest that the relaxin-3 neural network may represent a new ascending arousal pathway able to modulate a range of neural circuits including those affecting circadian rhythm and sleep/wake states, spatial and emotional memory, motivation and reward, the response to stress, and feeding and metabolism. Therefore, the relaxin-3 receptor (RXFP3) is a potential therapeutic target for the treatment of various CNS diseases. Here we describe a novel selective RXFP3 receptor positive allosteric modulator (PAM), 3-[3,5-Bis(trifluoromethyl)phenyl]-1-(3,4-dichlorobenzyl)-1-[2-(5-methoxy-1H-indol-3-yl)ethyl]urea (135PAM1). Calcium mobilization and cAMP accumulation assays in cell lines expressing the cloned human RXFP3 receptor show the compound does not directly activate RXFP3 receptor but increases functional responses to amidated relaxin-3 or R3/I5, a chimera of the INSL5 A chain and the Relaxin-3 B chain. 135PAM1 increases calcium mobilization in the presence of relaxin-3NH2 and R3/I5NH2 with pEC50 values of 6.54 (6.46 to 6.64) and 6.07 (5.94 to 6.20), respectively. In the cAMP accumulation assay, 135PAM1 inhibits the CRE response to forskolin with a pIC50 of 6.12 (5.98 to 6.27) in the presence of a probe (10 nM) concentration of relaxin-3NH2. 135PAM1 does not compete for binding with the orthosteric radioligand, [125I] R3I5 (amide), in membranes prepared from cells expressing the cloned human RXFP3 receptor. 135PAM1 is selective for RXFP3 over RXFP4, which also responds to relaxin-3. However, when using the free acid (native) form of relaxin-3 or R3/I5, 135PAM1 doesn't activate RXFP3 indicating that the compound's effect is probe dependent. Thus one can exchange the entire A-chain of the probe peptide while retaining PAM activity, but the state of the probe's c-terminus is crucial to allosteric activity of the PAM. These data demonstrate the existence of an allosteric site for modulation of this GPCR as well as the subtlety of changes in probe molecules that can affect allosteric modulation of RXFP3

    Systematic review and meta-analysis of the diagnostic accuracy of ultrasonography for deep vein thrombosis

    Get PDF
    Background Ultrasound (US) has largely replaced contrast venography as the definitive diagnostic test for deep vein thrombosis (DVT). We aimed to derive a definitive estimate of the diagnostic accuracy of US for clinically suspected DVT and identify study-level factors that might predict accuracy. Methods We undertook a systematic review, meta-analysis and meta-regression of diagnostic cohort studies that compared US to contrast venography in patients with suspected DVT. We searched Medline, EMBASE, CINAHL, Web of Science, Cochrane Database of Systematic Reviews, Cochrane Controlled Trials Register, Database of Reviews of Effectiveness, the ACP Journal Club, and citation lists (1966 to April 2004). Random effects meta-analysis was used to derive pooled estimates of sensitivity and specificity. Random effects meta-regression was used to identify study-level covariates that predicted diagnostic performance. Results We identified 100 cohorts comparing US to venography in patients with suspected DVT. Overall sensitivity for proximal DVT (95% confidence interval) was 94.2% (93.2 to 95.0), for distal DVT was 63.5% (59.8 to 67.0), and specificity was 93.8% (93.1 to 94.4). Duplex US had pooled sensitivity of 96.5% (95.1 to 97.6) for proximal DVT, 71.2% (64.6 to 77.2) for distal DVT and specificity of 94.0% (92.8 to 95.1). Triplex US had pooled sensitivity of 96.4% (94.4 to 97.1%) for proximal DVT, 75.2% (67.7 to 81.6) for distal DVT and specificity of 94.3% (92.5 to 95.8). Compression US alone had pooled sensitivity of 93.8 % (92.0 to 95.3%) for proximal DVT, 56.8% (49.0 to 66.4) for distal DVT and specificity of 97.8% (97.0 to 98.4). Sensitivity was higher in more recently published studies and in cohorts with higher prevalence of DVT and more proximal DVT, and was lower in cohorts that reported interpretation by a radiologist. Specificity was higher in cohorts that excluded patients with previous DVT. No studies were identified that compared repeat US to venography in all patients. Repeat US appears to have a positive yield of 1.3%, with 89% of these being confirmed by venography. Conclusion Combined colour-doppler US techniques have optimal sensitivity, while compression US has optimal specificity for DVT. However, all estimates are subject to substantial unexplained heterogeneity. The role of repeat scanning is very uncertain and based upon limited data

    Methods for specifying the target difference in a randomised controlled trial : the Difference ELicitation in TriAls (DELTA) systematic review

    Get PDF
    Peer reviewedPublisher PD

    A brief history of learning classifier systems: from CS-1 to XCS and its variants

    Get PDF
    © 2015, Springer-Verlag Berlin Heidelberg. The direction set by Wilson’s XCS is that modern Learning Classifier Systems can be characterized by their use of rule accuracy as the utility metric for the search algorithm(s) discovering useful rules. Such searching typically takes place within the restricted space of co-active rules for efficiency. This paper gives an overview of the evolution of Learning Classifier Systems up to XCS, and then of some of the subsequent developments of Wilson’s algorithm to different types of learning
    corecore