238 research outputs found
Game theory and Artificial Intelligence in just preservation
We humans can show presumption, arrogance and many dubious traits. By virtue of being land-dwelling, dexterous, relatively intelligent, and having good communication hardware and (good) fortune, we have for recent millennia largely had dominion of our planet. Yet humans often do not treat themselves (let alone other species) particularly well. Treves et al.’s idea of a multispecies justice system — not “prioritizing humans” but “finding practical ways to work within human systems” — invites consideration
How universal can an intelligence test be?
[EN] The notion of a universal intelligence test has been recently advocated as a means to assess humans,
non-human animals and machines in an integrated, uniform way. While the main motivation has been the
development of machine intelligence tests, the mere concept of a universal test has many implications
in the way human intelligence tests are understood, and their relation to other tests in comparative
psychology and animal cognition. From this diversity of subjects in the natural and artificial kingdoms,
the very possibility of constructing a universal test is still controversial. In this paper we rephrase the
question of whether universal intelligence tests are possible or not into the question of how universal
intelligence tests can be, in terms of subjects, interfaces and resolutions. We discuss the feasibility
and difficulty of universal tests depending on several levels according to what is taken for granted: the
communication milieu, the resolution, the reward system or the agent itself. We argue that such tests
must be highly adaptive, i.e., that tasks, resolution, rewards and communication have to be adapted
according to how the evaluated agent is reacting and performing. Even so, the most general expression
of a universal test may not be feasible (and, at best, might only be theoretically semi-computable).
Nonetheless, in general, we can analyse the universality in terms of some traits that lead to several levels
of universality and set the quest for universal tests as a progressive rather than absolute goal.This work was supported by the MEC/MINECO (projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02), the GVA (project PROMETEO/2008/051) and the COST-European Cooperation in the field of Scientific and Technical Research (project IC0801 AT).Dowe, DL.; Hernández Orallo, J. (2014). How universal can an intelligence test be?. Adaptive Behavior. 22(1):51-69. https://doi.org/10.1177/1059712313500502S516922
On potential cognitive abilities in the machine kingdom
The final publication is available at Springer via http://dx.doi.org/10.1007/s11023-012-9299-6Animals, including humans, are usually judged on what they could become, rather than what they are. Many physical and cognitive abilities in the ‘animal kingdom’ are only acquired (to a given degree) when the subject reaches a certain stage of development, which can be accelerated or spoilt depending on how the environment, training or education is. The term ‘potential ability’ usually refers to how quick and likely the process of attaining the ability is. In principle, things should not be different for the ‘machine kingdom’. While machines can be characterised by a set of cognitive abilities, and measuring them is already a big challenge, known as ‘universal psychometrics’, a more informative, and yet more challenging, goal would be to also determine the potential cognitive abilities of a machine. In this paper we investigate the notion of potential cognitive ability for machines, focussing especially on universality and intelligence. We consider several machine characterisations (non-interactive and interactive) and give definitions for each case, considering permanent and temporal potentials. From these definitions, we analyse the relation between some potential abilities, we bring out the dependency on the environment distribution and we suggest some ideas about how potential abilities can be measured. Finally, we also analyse the potential of environments at different levels and briefly discuss whether machines should be designed to be intelligent or potentially intelligent.We thank the anonymous reviewers for their comments, which have helped to significantly improve this paper. This work was supported by the MEC-MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT. Finally, we thank three pioneers ahead of their time(s). We thank Ray Solomonoff (1926-2009) and Chris Wallace (1933-2004) for all that they taught us, directly and indirectly. And, in his centenary year, we thank Alan Turing (1912-1954), with whom it perhaps all began.Hernández-Orallo, J.; Dowe, DL. (2013). On potential cognitive abilities in the machine kingdom. Minds and Machines. 23(2):179-210. https://doi.org/10.1007/s11023-012-9299-6S179210232Amari, S., Fujita, N., Shinomoto, S. (1992). Four types of learning curves. Neural Computation 4(4), 605–618.Aristotle (Translation, Introduction, and Commentary by Ross, W.D.) (1924). Aristotle’s Metaphysics. Oxford: Clarendon Press.Barmpalias, G. & Dowe, D. L. (2012). Universality probability of a prefix-free machine. Philosophical transactions of the Royal Society A [Mathematical, Physical and Engineering Sciences] (Phil Trans A), Theme Issue ‘The foundations of computation, physics and mentality: The Turing legacy’ compiled and edited by Barry Cooper and Samson Abramsky, 370, pp 3488–3511.Chaitin, G. J. (1966). On the length of programs for computing finite sequences. Journal of the Association for Computing Machinery, 13, 547–569.Chaitin, G. J. (1975). A theory of program size formally identical to information theory. Journal of the ACM (JACM), 22(3), 329–340.Dowe, D. L. (2008, September). Foreword re C. S. Wallace. Computer Journal, 51(5):523–560, Christopher Stewart WALLACE (1933–2004) memorial special issue.Dowe, D. L. (2011). MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: P. S. Bandyopadhyay, M. R. Forster (Eds), Handbook of the philosophy of science—Volume 7: Philosophy of statistics (pp. 901–982). Amsterdam: Elsevier.Dowe, D. L. & Hajek, A. R. (1997a). A computational extension to the turing test. Technical report #97/322, Dept Computer Science, Monash University, Melbourne, Australia, 9 pp, http://www.csse.monash.edu.au/publications/1997/tr-cs97-322-abs.html .Dowe, D. L. & Hajek, A. R. (1997b, September). A computational extension to the Turing Test. in Proceedings of the 4th conference of the Australasian Cognitive Science Society, University of Newcastle, NSW, Australia, 9 pp.Dowe, D. L. & Hajek, A. R. (1998, February). A non-behavioural, computational extension to the Turing Test. In: International conference on computational intelligence and multimedia applications (ICCIMA’98), Gippsland, Australia, pp 101–106.Dowe, D. L., Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learning curve: Implications of a quantitative analysis. In Proceedings of the National Academy of Sciences of the United States of America, 101(36), 13124–13131.Gardner, M. (1970). Mathematical games: The fantastic combinations of John Conway’s new solitaire game “life”. Scientific American, 223(4), 120–123.Goertzel, B. & Bugaj, S. V. (2009). AGI preschool: A framework for evaluating early-stage human-like AGIs. In Proceedings of the second international conference on artificial general intelligence (AGI-09), pp 31–36.Hernández-Orallo, J. (2000a). Beyond the Turing Test. Journal of Logic, Language & Information, 9(4), 447–466.Hernández-Orallo, J. (2000b). On the computational measurement of intelligence factors. In A. Meystel (Ed), Performance metrics for intelligent systems workshop (pp 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Hernández-Orallo, J. (2010). On evaluating agent performance in a fixed period of time. In M. Hutter et al. (Eds.), Proceedings of 3rd international conference on artificial general intelligence (pp. 25–30). New York: Atlantis Press.Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.Hernández-Orallo, J. & Dowe, D. L. (2011, April). Mammals, machines and mind games. Who’s the smartest?. The conversation, http://theconversation.edu.au/mammals-machines-and-mind-games-whos-the-smartest-566 .Hernández-Orallo J., Dowe D. L., España-Cubillo S., Hernández-Lloreda M. V., & Insa-Cabrera J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In: J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), Artificial general intelligence 2011, volume 6830, LNAI series, pp. 82–91. New York: Springer.Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2012a, March). Measuring cognitive abilities of machines, humans and non-human animals in a unified way: towards universal psychometrics. Technical report 2012/267, Faculty of Information Technology, Clayton School of I.T., Monash University, Australia.Hernández-Orallo, J., Insa, J., Dowe, D. L., & Hibbard, B. (2012b). Turing tests with Turing machines. In A. Voronkov (Ed.), The Alan Turing centenary conference, Turing-100, Manchester, volume 10 of EPiC Series, pp 140–156.Hernández-Orallo, J., & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of the international symposium of engineering of intelligent systems (EIS’98) (pp 146–163). Switzerland: ICSC Press.Herrmann, E., Call, J., Hernández-Lloreda, M. V., Hare, B., & Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis. Science, 317(5843), 1360–1366.Herrmann, E., Hernández-Lloreda, M. V., Call, J., Hare, B., & Tomasello, M. (2010). The structure of individual differences in the cognitive abilities of children and chimpanzees. Psychological Science, 21(1), 102–110.Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized general intelligences. Journal of educational psychology, 57(5), 253.Hutter, M. (2005). Universal artificial intelligence: Sequential decisions based on algorithmic probability. New York: Springer.Insa-Cabrera, J., Dowe, D. L., España, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011a). Comparing humans and AI agents. In AGI: 4th conference on artificial general intelligence—Lecture Notes in Artificial Intelligence (LNAI), volume 6830, pp 122–132. Springer, New York.Insa-Cabrera, J., Dowe, D. L., & Hernández-Orallo, J. (2011b). Evaluating a reinforcement learning algorithm with a general intelligence test. In CAEPIA—Lecture Notes in Artificial Intelligence (LNAI), volume 7023, pages 1–11. Springer, New York.Kearns, M. & Singh, S. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2), 209–232.Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1, 4–7.Legg, S. (2008, June). Machine super intelligence. Department of Informatics, University of Lugano.Legg, S. & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.Legg, S., & Veness, J. (2012). An approximation of the universal intelligence measure. In Proceedings of Solomonoff 85th memorial conference. New York: Springer.Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Li, M., Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed). New York: Springer.Little, V. L., & Bailey, K. G. (1972). Potential intelligence or intelligence test potential? A question of empirical validity. Journal of Consulting and Clinical Psychology, 39(1), 168.Mahoney, M. V. (1999). Text compression as a test for artificial intelligence. In Proceedings of the national conference on artificial intelligence, AAAI (pp. 486–502). New Jersey: Wiley.Mahrer, A. R. (1958). Potential intelligence: A learning theory approach to description and clinical implication. The Journal of General Psychology, 59(1), 59–71.Oppy, G., & Dowe, D. L. (2011). The Turing Test. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy. Stanford University. http://plato.stanford.edu/entries/turing-test/ .Orseau, L. & Ring, M. (2011). Self-modification and mortality in artificial agents. In AGI: 4th conference on artificial general intelligence—Lecture Notes in Artificial Intelligence (LNAI), volume 6830, pages 1–10. Springer, New York.Ring, M. & Orseau, L. (2011). Delusion, survival, and intelligent agents. In AGI: 4th conference on artificial general intelligence—Lecture Notes in Artificial Intelligence (LNAI), volume 6830, pp. 11–20. Springer, New York.Schaeffer, J., Burch, N., Bjornsson, Y., Kishimoto, A., Muller, M., Lake, R., et al. (2007). Checkers is solved. Science, 317(5844), 1518.Solomonoff, R. J. (1962). Training sequences for mechanized induction. In M. Yovits, G. Jacobi, & G. Goldsteins (Eds.), Self-Organizing Systems, 7, 425–434.Solomonoff, R. J. (1964). A formal theory of inductive inference. Information and Control, 7(1–22), 224–254.Solomonoff, R. J. (1967). Inductive inference research: Status, Spring 1967. RTB 154, Rockford Research, Inc., 140 1/2 Mt. Auburn St., Cambridge, Mass. 02138, July 1967.Solomonoff, R. J. (1978). Complexity-based induction systems: comparisons and convergence theorems. IEEE Transactions on Information Theory, 24(4), 422–432.Solomonoff, R. J. (1984). Perfect training sequences and the costs of corruption—A progress report on induction inference research. Oxbridge research.Solomonoff, R. J. (1985). The time scale of artificial intelligence: Reflections on social effects. Human Systems Management, 5, 149–153.Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: The MIT press.Thorp, T. R., & Mahrer, A. R. (1959). Predicting potential intelligence. Journal of Clinical Psychology, 15(3), 286–288.Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460.Veness, J., Ng, K. S., Hutter, M., & Silver, D. (2011). A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research, JAIR, 40, 95–142.Wallace, C. S. (2005). Statistical and inductive inference by minimum message length. New York: Springer.Wallace, C. S., & Boulton, D. M. (1968). An information measure for classification. Computer Journal, 11, 185–194.Wallace, C. S., & Dowe, D. L. (1999a). Minimum message length and Kolmogorov complexity. Computer Journal 42(4), 270–283.Wallace, C. S., & Dowe, D. L. (1999b). Refinements of MDL and MML coding. Computer Journal, 42(4), 330–337.Woergoetter, F., & Porr, B. (2008). Reinforcement learning. Scholarpedia, 3(3), 1448.Zvonkin, A. K., & Levin, L. A. (1970). The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 25, 83–124
IQ tests are not for machines, yet
[EN] Complex, but specific, tasks¿such as chess or Jeopardy!¿are popularly seen as milestones for artificial intelligence (AI). However, they are not appropriate for evaluating the intelligence of machines or measuring the progress in AI. Aware of this delusion, Detterman has recently raised a challenge prompting AI researchers to evaluate their artefacts against IQ tests. We agree that the philosophy behind (human) IQ tests is a much better approach to machine intelligence evaluation than these specific tasks, and also more practical and informative than the Turing test. However, we have first to recall some work on machine intelligence measurement which has shown that some IQ tests can be passed by relatively simple programs. This suggests that the challenge may not be so demanding and may just work as a sophisticated CAPTCHA, since some types of tests might be easier than others for the current state of AI. Second, we show that an alternative, formal derivation of intelligence tests for machines is possible, grounded in (algorithmic) information theory. In these tests, we have a proper mathematical definition of what is being measured. Third, we re-visit some research done in the past fifteen years for effectively measuring machine intelligence¿since some assumptions about the subjects and their distribution no longer hold.This work was supported by the MEC projects EXPLORAINGENIO TIN 2009-06078-E, CONSOLIDER-INGENIO 26706 and
TIN 2010-21062-C02-02, and GVA project PROMETEO/2008/051.Dowe, DL.; Hernández Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence. 40(2):77-81. doi:10.1016/j.intell.2011.12.001S778140
On more realistic environment distributions for defining, evaluating and developing intelligence
One insightful view of the notion of intelligence is the ability
to perform well in a diverse set of tasks, problems or environments. One of
the key issues is therefore the choice of this set, which can be formalised
as a `distribution¿. Formalising and properly defining this distribution is
an important challenge to understand what intelligence is and to achieve
artificial general intelligence (AGI). In this paper, we agree with previous
criticisms that a universal distribution using a reference universal Turing
machine (UTM) over tasks, environments, etc., is perhaps amuch too general
distribution, since, e.g., the probability of other agents appearing on
the scene or having some social interaction is almost 0 for many reference
UTMs. Instead, we propose the notion of Darwin-Wallace distribution for
environments, which is inspired by biological evolution, artificial life and
evolutionary computation. However, although enlightening about where
and how intelligence should excel, this distribution has so many options
and is uncomputable in so many ways that we certainly need a more practical
alternative. We propose the use of intelligence tests over multi-agent
systems, in such a way that agents with a certified level of intelligence at
a certain degree are used to construct the tests for the next degree. This
constructive methodology can then be used as a more realistic intelligence
test and also as a testbed for developing and evaluating AGI systems.We thank the anonymous reviewers for their helpful comments. We also thank the funding from the Spanish MEC and MICINN for projects
TIN2009-06078-E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-
C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051Hernández Orallo, J.; Dowe, DL.; España Cubillo, S.; Hernández-Lloreda, MV.; Insa Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. En Artificial General Intelligence. Springer Verlag (Germany). 6830:82-91. https://doi.org/10.1007/978-3-642-22887-2_9S82916830Dowe, D.L.: Foreword re C. S. Wallace. Computer Journal 51(5), 523–560 (2008); Christopher Stewart WALLACE (1933-2004) memorial special issueDowe, D.L.: Minimum Message Length and statistically consistent invariant (objective?) Bayesian probabilistic inference - from (medical) “evidence”. Social Epistemology 22(4), 433–460 (2008)Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Bandyopadhyay, P.S., Forster, M.R. (eds.) Handbook of the Philosophy of Science. Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier, Amsterdam (2011)Dowe, D.L., Hajek, A.R.: A computational extension to the Turing Test. In: 4th Conf. of the Australasian Cognitive Science Society, Newcastle, Australia (1997)Goertzel, B.: The Embodied Communication Prior: A characterization of general intelligence in the context of Embodied social interaction. In: 8th IEEE International Conference on, Cognitive Informatics, ICCI 2009, pp. 38–43. IEEE, Los Alamitos (2009)Goertzel, B., Bugaj, S.V.: AGI Preschool: a framework for evaluating early-stage human-like AGIs. In: Intl. Conf. on Artificial General Intelligence (AGI 2009) (2009)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) Artificial General Intelligence, pp. 182–183 (2010)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Herrmann, E., Call, J., Hernández-Lloreda, M.V., Hare, B., Tomasello, M.: Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis. Science 317(5843), 1360–1366 (2007)Hibbard, B.: Bias and No Free Lunch in Formal Measures of Intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Krebs, J.R., Dawkins, R.: Animal signals: mind-reading and manipulation. Behavioural Ecology: an evolutionary approach 2, 380–402 (1984)Langton, C.G.: Artificial life: An overview. The MIT Press, Cambridge (1997)Legg, S., Hutter, M.: A collection of definitions of intelligence. In: Proc. of the 2007 Conf. on Artificial General Intelligence, pp. 17–24. IOS Press, Amsterdam (2007)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Levin, L.A.: Universal sequential search problems. Problems of Information Transmission 9(3), 265–266 (1973)Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: Proc. 4th ICCS International Conference on Cognitive Science (ICCS 2003), Sydney, Australia, pp. 570–575 (2003)Schmidhuber, J.: A computer scientist’s view of life, the universe, and everything. In: Foundations of Computer Science, p. 201. Springer, Heidelberg (1997)Schmidhuber, J.: The Speed Prior: a new simplicity measure yielding near-optimal computable predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 123–127. Springer, Heidelberg (2002)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964)Stone, P., Veloso, M.: Towards collaborative and adversarial learning: A case study in robotic soccer. Intl. J. of Human-Computers Studies 48(1), 83–104 (1998)Tomasello, M., Herrmann, E.: Ape and human cognition: What’s the difference? Current Directions in Psychological Science 19(1), 3–8 (2010
Universal psychometrics: measuring cognitive abilities in the machine kingdom
We present and develop the notion of ‘universal psychometrics’ as a subject of study, and
eventually a discipline, that focusses on the measurement of cognitive abilities for the machine
kingdom, which comprises any (cognitive) system, individual or collective, either artificial,
biological or hybrid. Universal psychometrics can be built, of course, upon the experience,
techniques and methodologies from (human) psychometrics, comparative cognition and related
areas. Conversely, the perspective and techniques which are being developed in the area
of machine intelligence measurement using (algorithmic) information theory can be of much
broader applicability and implication outside artificial intelligence. This general approach
to universal psychometrics spurs the re-understanding of most (if not all) of the big issues
about the measurement of cognitive abilities, and creates a new foundation for (re)defining
and mathematically formalising the concept of cognitive task, evaluable subject, interface,
task choice, difficulty, agent response curves, etc. We introduce the notion of a universal
cognitive test and discuss whether (and when) it may be necessary for exploring the machine
kingdom. On the issue of intelligence and very general abilities, we also get some results and
connections with the related notions of no-free-lunch theorems and universal priorsWe thank the anonymous reviewers for their comments. This work was supported by the MEC-MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST -European Cooperation in the field of Scientific and Technical Research IC0801 ATHernández Orallo, J.; Dowe, DL.; Hernández Lloreda, MV. (2014). Universal psychometrics: measuring cognitive abilities in the machine kingdom. Cognitive Systems Research. 27:50-74. https://doi.org/10.1016/j.cogsys.2013.06.001S50742
Una mirada simbólica la color. Reflexiones sobre fobias y filias en el mundo occidental
El color es un componente esencial de los códigos sociales, cargado de significados pertenecientes a la memoria colectiva. Este signo, con alta injerencia connotativa y elevado valor significativo en la imagen, es el resultado de la construcción de un lenguaje simbólico que posee sus propias leyes y reglas de funcionamiento. Los colores han sido un referente para la expresión del pensamiento a través del discurso visual, a la vez que su inmediatez para transmitir el mensaje los ha hecho ser considerados en determinados momentos un factor de amenaza latente. En este artículo se analizan aspectos relativos al papel jugado por el color en el campo del arte y se exponen las causas por las cuales este elemento ha sido objeto de rechazo en unas épocas y de exaltación en otras, posibilitando en ambos casos la transmisión de múltiples mensajes
On environment difficulty and discriminating power
The final publication is available at Springer via http://dx.doi.org/10.1007/s10458-014-9257-1This paper presents a way to estimate the difficulty and discriminating power of
any task instance. We focus on a very general setting for tasks: interactive (possibly multiagent)
environments where an agent acts upon observations and rewards. Instead of analysing
the complexity of the environment, the state space or the actions that are performed by the
agent, we analyse the performance of a population of agent policies against the task, leading
to a distribution that is examined in terms of policy complexity. This distribution is then
sliced by the algorithmic complexity of the policy and analysed through several diagrams
and indicators. The notion of environment response curve is also introduced, by inverting the
performance results into an ability scale. We apply all these concepts, diagrams and indicators
to two illustrative problems: a class of agent-populated elementary cellular automata, showing
how the difficulty and discriminating power may vary for several environments, and a multiagent
system, where agents can become predators or preys, and may need to coordinate.
Finally, we discuss how these tools can be applied to characterise (interactive) tasks and
(multi-agent) environments. These characterisations can then be used to get more insight
about agent performance and to facilitate the development of adaptive tests for the evaluation
of agent abilities.I thank the reviewers for their comments, especially those aiming at a clearer connection with the field of multi-agent systems and the suggestion of better approximations for the calculation of the response curves. The implementation of the elementary cellular automata used in the environments is based on the library 'CellularAutomaton' by John Hughes for R [58]. I am grateful to Fernando Soler-Toscano for letting me know about their work [65] on the complexity of 2D objects generated by elementary cellular automata. I would also like to thank David L. Dowe for his comments on a previous version of this paper. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).José Hernández-Orallo (2015). On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems. 29(3):402-454. https://doi.org/10.1007/s10458-014-9257-1S402454293Anderson, J., Baltes, J., & Cheng, C. T. (2011). Robotics competitions as benchmarks for ai research. The Knowledge Engineering Review, 26(01), 11–17.Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In Proceedings of the National Conference on Artificial Intelligence (pp. 119–125). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Antunes, L., Fortnow, L., van Melkebeek, D., & Vinodchandran, N. V. (2006). Computational depth: Concept and applications. Theoretical Computer Science, 354(3), 391–404. Foundations of Computation Theory (FCT 2003), 14th Symposium on Fundamentals of Computation Theory 2003.Arai, K., Kaminka, G. A., Frank, I., & Tanaka-Ishii, K. (2003). Performance competitions as research infrastructure: Large scale comparative studies of multi-agent teams. Autonomous Agents and Multi-Agent Systems, 7(1–2), 121–144.Ashcraft, M. H., Donley, R. D., Halas, M. A., & Vakali, M. (1992). Chapter 8 working memory, automaticity, and problem difficulty. In Jamie I.D. Campbell (Ed.), The nature and origins of mathematical skills, volume 91 of advances in psychology (pp. 301–329). North-Holland.Ay, N., Müller, M., & Szkola, A. (2010). Effective complexity and its relation to logical depth. IEEE Transactions on Information Theory, 56(9), 4593–4607.Barch, D. M., Braver, T. S., Nystrom, L. E., Forman, S. D., Noll, D. C., & Cohen, J. D. (1997). Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35(10), 1373–1380.Bordini, R. H., Hübner, J. F., & Wooldridge, M. (2007). Programming multi-agent systems in AgentSpeak using Jason. London: Wiley. com.Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S. et al. (2000). Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the National Conference on Artificial Intelligence (pp. 355–362). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.Chaitin, G. J. (1977). Algorithmic information theory. IBM Journal of Research and Development, 21, 350–359.Chedid, F. B. (2010). Sophistication and logical depth revisited. In 2010 IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) (pp. 1–4). IEEE.Cheeseman, P., Kanefsky, B. & Taylor, W. M. (1991). Where the really hard problems are. In Proceedings of IJCAI-1991 (pp. 331–337).Dastani, M. (2008). 2APL: A practical agent programming language. Autonomous Agents and Multi-agent Systems, 16(3), 214–248.Delahaye, J. P. & Zenil, H. (2011). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63–77Dowe, D. L. (2008). Foreword re C. S. Wallace. Computer Journal, 51(5), 523–560. Christopher Stewart WALLACE (1933–2004) memorial special issue.Dowe, D. L., & Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.Du, D. Z., & Ko, K. I. (2011). Theory of computational complexity (Vol. 58). London: Wiley-Interscience.Elo, A. E. (1978). The rating of chessplayers, past and present (Vol. 3). London: Batsford.Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum.Fatès, N. & Chevrier, V. (2010). How important are updating schemes in multi-agent systems? an illustration on a multi-turmite model. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.Ferber, J. & Müller, J. P. (1996). Influences and reaction: A model of situated multiagent systems. In Proceedings of Second International Conference on Multi-Agent Systems (ICMAS-96) (pp. 72–79).Ferrando, P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Applied Psychological Measurement, 33(1), 9–24.Ferrando, P. J. (2012). Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica, 33, 111–139.Gent, I. P., & Walsh, T. (1994). Easy problems are sometimes hard. Artificial Intelligence, 70(1), 335–345.Gershenson, C. & Fernandez, N. (2012). Complexity and information: Measuring emergence, self-organization, and homeostasis at multiple scales. Complexity, 18(2), 29–44.Gruner, S. (2010). Mobile agent systems and cellular automata. Autonomous Agents and Multi-agent Systems, 20(2), 198–233.Hardman, D. K., & Payne, S. J. (1995). Problem difficulty and response format in syllogistic reasoning. The Quarterly Journal of Experimental Psychology, 48(4), 945–975.He, J., Reeves, C., Witt, C., & Yao, X. (2007). A note on problem difficulty measures in black-box optimization: Classification, realizations and predictability. Evolutionary Computation, 15(4), 435–443.Hernández-Orallo, J. (2000). Beyond the turing test. Journal of Logic Language & Information, 9(4), 447–466.Hernández-Orallo, J. (2000). On the computational measurement of intelligence factors. In A. Meystel (Ed.), Performance metrics for intelligent systems workshop (pp. 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Hernández-Orallo, J. (2000). Thesis: Computational measures of information gain and reinforcement in inference processes. AI Communications, 13(1), 49–50.Hernández-Orallo, J. (2010). A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In M. Hutter et al. (Ed.), 3rd International Conference on Artificial General Intelligence (pp. 182–183). Atlantis Press Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf .Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.Hernández-Orallo, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Insa-Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 82–91). Berlin: Springer.Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2014). Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research, 27, 50–74.Hernández-Orallo, J., Insa, J., Dowe, D. L. & Hibbard, B. (2012). Turing tests with turing machines. In A. Voronkov (Ed.), The Alan Turing Centenary Conference, Turing-100, Manchester, 2012, volume 10 of EPiC Series (pp. 140–156).Hernández-Orallo, J. & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of International Symposium of Engineering of Intelligent Systems (EIS’98) (pp. 146–163). ICSC Press.Hibbard, B. (2009). Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence, 1(1), 54–61.Hoos, H. H. (1999). Sat-encodings, search space structure, and local search performance. In 1999 International Joint Conference on Artificial Intelligence (Vol. 16, pp. 296–303).Insa-Cabrera, J., Benacloch-Ayuso, J. L., & Hernández-Orallo, J. (2012). On measuring social intelligence: Experiments on competition and cooperation. In J. Bach, B. Goertzel, & M. Iklé (Eds.), AGI, volume 7716 of lecture notes in computer science (pp. 126–135). Berlin: Springer.Insa-Cabrera, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011). Comparing humans and AI agents. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 122–132). Berlin: Springer.Knuth, D. E. (1973). Sorting and searching, volume 3 of the art of computer programming. Reading, MA: Addison-Wesley.Kotovsky, K., & Simon, H. A. (1990). What makes some problems really hard: Explorations in the problem space of difficulty. Cognitive Psychology, 22(2), 143–183.Legg, S. (2008). Machine super intelligence. PhD thesis, Department of Informatics, University of Lugano, June 2008.Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.Leonetti, M. & Iocchi, L. (2010). Improving the performance of complex agent plans through reinforcement learning. In Proceedings of the 2010 International Conference on Autonomous Agents and Multiagent Systems (Vol. 1, pp. 723–730). International Foundation for Autonomous Agents and Multiagent Systems.Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Levin, L. A. (1986). Average case complete problems. SIAM Journal on Computing, 15, 285.Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). Berlin: Springer.Low, C. K., Chen, T. Y., & Rónnquist, R. (1999). Automated test case generation for bdi agents. Autonomous Agents and Multi-agent Systems, 2(4), 311–332.Madden, M. G., & Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review, 21(3), 375–398.Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115(2), 300.Michel, F. (2004). Formalisme, outils et éléments méthodologiques pour la modélisation et la simulation multi-agents. PhD thesis, Université des sciences et techniques du Languedoc, Montpellier.Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.Orponen, P., Ko, K. I., Schöning, U., & Watanabe, O. (1994). Instance complexity. Journal of the ACM (JACM), 41(1), 96–121.Simon, H. A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70(6), 534.Team, R., et al. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.Wiering, M., & van Otterlo, M. (Eds.). (2012). Reinforcement learning: State-of-the-art. Berlin: Springer.Wolfram, S. (2002). A new kind of science. Champaign, IL: Wolfram Media.Zatuchna, Z., & Bagnall, A. (2009). Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior, 17(1), 28–57.Zenil, H. (2010). Compression-based investigation of the dynamical properties of cellular automata and other systems. Complex Systems, 19(1), 1–28.Zenil, H. (2011). Une approche expérimentale à la théorie algorithmique de la complexité. PhD thesis, Dissertation in fulfilment of the degree of Doctor in Computer Science, Université de Lille.Zenil, H., Soler-Toscano, F., Delahaye, J. P. & Gauvrit, N. (2012). Two-dimensional kolmogorov complexity and validation of the coding theorem method by compressibility. arXiv, preprint arXiv:1212.6745
A Data-driven Model of Nucleosynthesis with Chemical Tagging in a Lower-dimensional Latent Space
Chemical tagging seeks to identify unique star formation sites from present-day stellar abundances. Previous techniques have treated each abundance dimension as being statistically independent, despite theoretical expectations that many elements can be produced by more than one nucleosynthetic process. In this work, we introduce a data-driven model of nucleosynthesis, where a set of latent factors (e.g., nucleosynthetic yields) contribute to all stars with different scores and clustering (e.g., chemical tagging) is modeled by a mixture of multivariate Gaussians in a lower-dimensional latent space. We use an exact method to simultaneously estimate the factor scores for each star, the partial assignment of each star to each cluster, and the latent factors common to all stars, even in the presence of missing data entries. We use an information-theoretic Bayesian principle to estimate the number of latent factors and clusters. Using the second Galah data release, we find that six latent factors are preferred to explain N = 2566 stars with 17 chemical abundances. We identify the rapid- and slow neutron-capture processes, as well as latent factors consistent with Fe-peak and α-element production, and another where K and Zn dominate. When we consider N ~ 160,000 stars with missing abundances, we find another seven factors, as well as 16 components in latent space. Despite these components showing separation in chemistry, which is explained through different yield contributions, none show significant structure in their positions or motions. We argue that more data and joint priors on cluster membership that are constrained by dynamical models are necessary to realize chemical tagging at a galactic-scale. We release accompanying software that scales well with the available data, allowing for the model's parameters to be optimized in seconds given a fixed number of latent factors, components, and ~107 abundance measurements.We
acknowledge support from the Australian Research Council
through Discovery Project DP160100637. J.B.H. is supported
by a Laureate Fellowship from the Australian Research
Council. Parts of this research were supported by the Australian
Research Council (ARC) Centre of Excellence for All Sky
Astrophysics in 3 Dimensions (ASTRO 3D), through project
number CE170100013. S.~B. acknowledges funds from the
Alexander von Humboldt Foundation in the framework of the
Sofja Kovalevskaja Award endowed by the Federal Ministry of
Education and Research. S.B. is supported by the Australian
Research Council (grants DP150100250 and DP160103747).
S.L.M. acknowledges the support of the UNSW Scientia
Fellowship program. J.D.S., S.L.M., and D.B.Z. acknowledge
the support of the Australian Research Council through
Discovery Project grant DP180101791. The Galah survey is
based on observations made at the Australian Astronomical
Observatory, under programmes A/2013B/13, A/2014A/25,
A/2015A/19, and A/2017A/18. We acknowledge the traditional owners of the land on which the AAT stands, the
Gamilaraay people, and pay our respects to elders past and
present. This research has made use of NASA’s Astrophysics
Data System
- …