68 research outputs found

    Successor features for transfer in reinforcement learning

    Get PDF
    Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. Our focus is on transfer where the reward functions vary across tasks while the environment's dynamics remain the same. The method we propose rests on two key ideas: "successor features," a value function representation that decouples the dynamics of the environment from the rewards, and "generalized policy improvement," a generalization of dynamic programming's policy improvement step that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows transfer to take place between tasks without any restriction. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice

    General video game AI: Competition, challenges, and opportunities

    Get PDF
    The General Video Game AI framework and competition pose the problem of creating artificial intelligence that can play a wide, and in principle unlimited, range of games. Concretely, it tackles the problem of devising an algorithm that is able to play any game it is given, even if the game is not known a priori. This area of study can be seen as an approximation of General Artificial Intelligence, with very little room for game-dependent heuristics. This short paper summarizes the motivation, infrastructure, results and future plans of General Video Game AI, stressing the findings and first conclusions drawn after two editions of our competition, and outlining our future plans

    Using humanoid robots to study human behavior

    Get PDF
    Our understanding of human behavior advances as our humanoid robotics work progresses-and vice versa. This team's work focuses on trajectory formation and planning, learning from demonstration, oculomotor control and interactive behaviors. They are programming robotic behavior based on how we humans “program” behavior in-or train-each other

    Quantification of depth of anesthesia by nonlinear time series analysis of brain electrical activity

    Full text link
    We investigate several quantifiers of the electroencephalogram (EEG) signal with respect to their ability to indicate depth of anesthesia. For 17 patients anesthetized with Sevoflurane, three established measures (two spectral and one based on the bispectrum), as well as a phase space based nonlinear correlation index were computed from consecutive EEG epochs. In absence of an independent way to determine anesthesia depth, the standard was derived from measured blood plasma concentrations of the anesthetic via a pharmacokinetic/pharmacodynamic model for the estimated effective brain concentration of Sevoflurane. In most patients, the highest correlation is observed for the nonlinear correlation index D*. In contrast to spectral measures, D* is found to decrease monotonically with increasing (estimated) depth of anesthesia, even when a "burst-suppression" pattern occurs in the EEG. The findings show the potential for applications of concepts derived from the theory of nonlinear dynamics, even if little can be assumed about the process under investigation.Comment: 7 pages, 5 figure

    C-tests revisited: back and forth with complexity

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21365-1_28We explore the aggregation of tasks by weighting them using a difficulty function that depends on the complexity of the (acceptable) policy for the task (instead of a universal distribution over tasks or an adaptive test). The resulting aggregations and decompositions are (now retrospectively) seen as the natural (and trivial) interactive generalisation of the C-tests.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). C-tests revisited: back and forth with complexity. En Artificial General Intelligence 8th International Conference, AGI 2015, AGI 2015, Berlin, Germany, July 22-25, 2015, Proceedings. Springer International Publishing. 272-282. https://doi.org/10.1007/978-3-319-21365-1_28S272282Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Computational measures of information gain and reinforcement in inference processes. AI Communications 13(1), 49–50 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint arXiv:1408.6908Hernández-Orallo, J.: On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/10.1007/s10458-014-9257-1Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl. Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Hibbard, B.: Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3 edn. Springer-Verlag (2008)Schaul, T.: An extensible description language for video games. IEEE Transactions on Computational Intelligence and AI in Games PP(99), 1–1 (2014)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964

    Neural Network Fusion of Color, Depth and Location for Object Instance Recognition on a Mobile Robot

    Get PDF
    International audienceThe development of mobile robots for domestic assistance re-quires solving problems integrating ideas from different fields of research like computer vision, robotic manipulation, localization and mapping. Semantic mapping, that is, the enrichment a map with high-level infor-mation like room and object identities, is an example of such a complex robotic task. Solving this task requires taking into account hard software and hardware constraints brought by the context of autonomous mobile robots, where short processing times and low energy consumption are mandatory. We present a light-weight scene segmentation and object in-stance recognition algorithm using an RGB-D camera and demonstrate it in a semantic mapping experiment. Our method uses a feed-forward neural network to fuse texture, color and depth information. Running at 3 Hz on a single laptop computer, our algorithm achieves a recognition rate of 97% in a controlled environment, and 87% in the adversarial con-ditions of a real robotic task. Our results demonstrate that state of the art recognition rates on a database does not guarantee performance in a real world experiment. We also show the benefit in these conditions of fusing several recognition decisions and data from different sources. The database we compiled for the purpose of this study is publicly available

    Neural network generated parametrizations of deeply virtual Compton form factors

    Full text link
    We have generated a parametrization of the Compton form factor (CFF) H based on data from deeply virtual Compton scattering (DVCS) using neural networks. This approach offers an essentially model-independent fitting procedure, which provides realistic uncertainties. Furthermore, it facilitates propagation of uncertainties from experimental data to CFFs. We assumed dominance of the CFF H and used HERMES data on DVCS off unpolarized protons. We predict the beam charge-spin asymmetry for a proton at the kinematics of the COMPASS II experiment.Comment: 16 pages, 5 figure

    Markov chain Monte Carlo with Gaussian processes for fast parameter estimation and uncertainty quantification in a 1D fluid‐dynamics model of the pulmonary circulation

    Get PDF
    The past few decades have witnessed an explosive synergy between physics and the life sciences. In particular, physical modelling in medicine and physiology is a topical research area. The present work focuses on parameter inference and uncertainty quantification in a 1D fluid‐dynamics model for quantitative physiology: the pulmonary blood circulation. The practical challenge is the estimation of the patient‐specific biophysical model parameters, which cannot be measured directly. In principle this can be achieved based on a comparison between measured and predicted data. However, predicting data requires solving a system of partial differential equations (PDEs), which usually have no closed‐form solution, and repeated numerical integrations as part of an adaptive estimation procedure are computationally expensive. In the present article, we demonstrate how fast parameter estimation combined with sound uncertainty quantification can be achieved by a combination of statistical emulation and Markov chain Monte Carlo (MCMC) sampling. We compare a range of state‐of‐the‐art MCMC algorithms and emulation strategies, and assess their performance in terms of their accuracy and computational efficiency. The long‐term goal is to develop a method for reliable disease prognostication in real time, and our work is an important step towards an automatic clinical decision support system
    • 

    corecore