22 research outputs found
Comparing humans and AI agents
Comparing humans and machines is one important source of
information about both machine and human strengths and limitations.
Most of these comparisons and competitions are performed in rather
specific tasks such as calculus, speech recognition, translation, games,
etc. The information conveyed by these experiments is limited, since it
portrays that machines are much better than humans at some domains
and worse at others. In fact, CAPTCHAs exploit this fact. However,
there have only been a few proposals of general intelligence tests in the
last two decades, and, to our knowledge, just a couple of implementations
and evaluations. In this paper, we implement one of the most recent test
proposals, devise an interface for humans and use it to compare the
intelligence of humans and Q-learning, a popular reinforcement learning
algorithm. The results are highly informative in many ways, raising many
questions on the use of a (universal) distribution of environments, on the
role of measuring knowledge acquisition, and other issues, such as speed,
duration of the test, scalability, etc.We thank the anonymous reviewers for their helpful
comments. We also thank José Antonio Martín H. for helping us with several
issues about the RL competition, RL-Glue and reinforcement learning in general. We are also grateful to all the subjects who took the test. We also thank
the funding from the Spanish MEC and MICINN for projects TIN2009-06078-
E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC
FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051Insa Cabrera, J.; Dowe, DL.; España Cubillo, S.; Henánez-Lloreda, MV.; Hernández Orallo, J. (2011). Comparing humans and AI agents. En Artificial General Intelligence. Springer Verlag (Germany). 6830:122-132. https://doi.org/10.1007/978-3-642-22887-2_13S1221326830Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Gordon, D., Subramanian, D.: A cognitive model of learning to navigate. In: Proc. 19th Conf. of the Cognitive Science Society, 1997, vol. 25, p. 271. Lawrence Erlbaum, Mahwah (1997)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 182–183. Atlantis Press, London (2010) Extended report at, http://users.dsic.upv.es/proy/anynt/unbiased.pdfHernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., España-Cubillo, S., Hernández-Lloreda, M.V., Insa-Cabrera, J.: On more realistic environment distributions for defining, evaluating and developing intelligence. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS(LNAI), pp. 81–90. Springer, Heidelberg (2011)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. In: Intl Joint Conf on Artificial Intelligence, IJCAI, vol. 19, p. 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc., Heidelberg (2008)Oppy, G., Dowe, D.L.: The Turing Test. In: Zalta, E.N. (ed.) Stanford Encyclopedia of Philosophy, Stanford University, Stanford (2011), http://plato.stanford.edu/entries/turing-test/Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: 4th Intl. Conf. on Cognitive Science (ICCS 2003), Sydney, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: ICML 2006, pp. 881–888. New York (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press, Cambridge (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: A Monte Carlo AIXI Approximation. Journal of Artificial Intelligence Research, JAIR 40, 95–142 (2011)von Ahn, L., Blum, M., Langford, J.: Telling humans and computers apart automatically. Communications of the ACM 47(2), 56–60 (2004)Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. learning 8(3), 279–292 (1992
Towards the evaluation of cognitive models using anytime intelligence tests
Cognitive models are usually evaluated based on their fit to empirical data. Artificial intelligence (AI) systems on the other hand are mainly evaluated based on their performance. Within the field of artificial general intelligence (AGI) research, a new type of performance measure for AGI systems has recently been proposed that tries to cover both humans and artificial systems: Anytime Intelligence Tests (AIT; Hernández-Orallo & Dowe, 2010). This paper explores the viability of the AIT formalism for the evaluation of cognitive models based on data from the ICCM 2009 “Dynamic Stocks and Flows” modeling challenge
Human vs. supervised machine learning: Who learns patterns faster?
The capabilities of supervised machine learning (SML), especially compared to human abilities, are being discussed in scientific research and in the usage of SML. This study provides an answer to how learning performance differs between humans and machines when there is limited training data. We have designed an experiment in which 44 humans and three different machine learning algorithms identify patterns in labeled training data and have to label instances according to the patterns they find. The results show a high dependency between performance and the underlying patterns of the task. Whereas humans perform relatively similarly across all patterns, machines show large performance differences for the various patterns in our experiment. After seeing 20 instances in the experiment, human performance does not improve anymore, which we relate to theories of cognitive overload. Machines learn slower but can reach the same level or may even outperform humans in 2 of the 4 of used patterns. However, machines need more instances compared to humans for the same results. The performance of machines is comparably lower for the other 2 patterns due to the difficulty of combining input features
How universal can an intelligence test be?
[EN] The notion of a universal intelligence test has been recently advocated as a means to assess humans,
non-human animals and machines in an integrated, uniform way. While the main motivation has been the
development of machine intelligence tests, the mere concept of a universal test has many implications
in the way human intelligence tests are understood, and their relation to other tests in comparative
psychology and animal cognition. From this diversity of subjects in the natural and artificial kingdoms,
the very possibility of constructing a universal test is still controversial. In this paper we rephrase the
question of whether universal intelligence tests are possible or not into the question of how universal
intelligence tests can be, in terms of subjects, interfaces and resolutions. We discuss the feasibility
and difficulty of universal tests depending on several levels according to what is taken for granted: the
communication milieu, the resolution, the reward system or the agent itself. We argue that such tests
must be highly adaptive, i.e., that tasks, resolution, rewards and communication have to be adapted
according to how the evaluated agent is reacting and performing. Even so, the most general expression
of a universal test may not be feasible (and, at best, might only be theoretically semi-computable).
Nonetheless, in general, we can analyse the universality in terms of some traits that lead to several levels
of universality and set the quest for universal tests as a progressive rather than absolute goal.This work was supported by the MEC/MINECO (projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02), the GVA (project PROMETEO/2008/051) and the COST-European Cooperation in the field of Scientific and Technical Research (project IC0801 AT).Dowe, DL.; Hernández Orallo, J. (2014). How universal can an intelligence test be?. Adaptive Behavior. 22(1):51-69. https://doi.org/10.1177/1059712313500502S516922