Search CORE

324,734 research outputs found

On more realistic environment distributions for defining, evaluating and developing intelligence

Author: Dowe David L.
España Cubillo Sergio
Hernández Orallo José
Hernández-Lloreda M. Victoria
Insa Cabrera Javier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

One insightful view of the notion of intelligence is the ability to perform well in a diverse set of tasks, problems or environments. One of the key issues is therefore the choice of this set, which can be formalised as a `distribution¿. Formalising and properly defining this distribution is an important challenge to understand what intelligence is and to achieve artificial general intelligence (AGI). In this paper, we agree with previous criticisms that a universal distribution using a reference universal Turing machine (UTM) over tasks, environments, etc., is perhaps amuch too general distribution, since, e.g., the probability of other agents appearing on the scene or having some social interaction is almost 0 for many reference UTMs. Instead, we propose the notion of Darwin-Wallace distribution for environments, which is inspired by biological evolution, artificial life and evolutionary computation. However, although enlightening about where and how intelligence should excel, this distribution has so many options and is uncomputable in so many ways that we certainly need a more practical alternative. We propose the use of intelligence tests over multi-agent systems, in such a way that agents with a certified level of intelligence at a certain degree are used to construct the tests for the next degree. This constructive methodology can then be used as a more realistic intelligence test and also as a testbed for developing and evaluating AGI systems.We thank the anonymous reviewers for their helpful comments. We also thank the funding from the Spanish MEC and MICINN for projects TIN2009-06078-E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062- C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051Hernández Orallo, J.; Dowe, DL.; España Cubillo, S.; Hernández-Lloreda, MV.; Insa Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. En Artificial General Intelligence. Springer Verlag (Germany). 6830:82-91. https://doi.org/10.1007/978-3-642-22887-2_9S82916830Dowe, D.L.: Foreword re C. S. Wallace. Computer Journal 51(5), 523–560 (2008); Christopher Stewart WALLACE (1933-2004) memorial special issueDowe, D.L.: Minimum Message Length and statistically consistent invariant (objective?) Bayesian probabilistic inference - from (medical) “evidence”. Social Epistemology 22(4), 433–460 (2008)Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Bandyopadhyay, P.S., Forster, M.R. (eds.) Handbook of the Philosophy of Science. Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier, Amsterdam (2011)Dowe, D.L., Hajek, A.R.: A computational extension to the Turing Test. In: 4th Conf. of the Australasian Cognitive Science Society, Newcastle, Australia (1997)Goertzel, B.: The Embodied Communication Prior: A characterization of general intelligence in the context of Embodied social interaction. In: 8th IEEE International Conference on, Cognitive Informatics, ICCI 2009, pp. 38–43. IEEE, Los Alamitos (2009)Goertzel, B., Bugaj, S.V.: AGI Preschool: a framework for evaluating early-stage human-like AGIs. In: Intl. Conf. on Artificial General Intelligence (AGI 2009) (2009)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) Artificial General Intelligence, pp. 182–183 (2010)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Herrmann, E., Call, J., Hernández-Lloreda, M.V., Hare, B., Tomasello, M.: Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis. Science 317(5843), 1360–1366 (2007)Hibbard, B.: Bias and No Free Lunch in Formal Measures of Intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Krebs, J.R., Dawkins, R.: Animal signals: mind-reading and manipulation. Behavioural Ecology: an evolutionary approach 2, 380–402 (1984)Langton, C.G.: Artificial life: An overview. The MIT Press, Cambridge (1997)Legg, S., Hutter, M.: A collection of definitions of intelligence. In: Proc. of the 2007 Conf. on Artificial General Intelligence, pp. 17–24. IOS Press, Amsterdam (2007)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Levin, L.A.: Universal sequential search problems. Problems of Information Transmission 9(3), 265–266 (1973)Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: Proc. 4th ICCS International Conference on Cognitive Science (ICCS 2003), Sydney, Australia, pp. 570–575 (2003)Schmidhuber, J.: A computer scientist’s view of life, the universe, and everything. In: Foundations of Computer Science, p. 201. Springer, Heidelberg (1997)Schmidhuber, J.: The Speed Prior: a new simplicity measure yielding near-optimal computable predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 123–127. Springer, Heidelberg (2002)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964)Stone, P., Veloso, M.: Towards collaborative and adversarial learning: A case study in robotic soccer. Intl. J. of Human-Computers Studies 48(1), 83–104 (1998)Tomasello, M., Herrmann, E.: Ape and human cognition: What’s the difference? Current Directions in Psychological Science 19(1), 3–8 (2010

RiuNet

Instrumental Properties of Social Testbeds

Author: B Horling
J Hernández-Orallo
J Hernández-Orallo
J Simao
S Legg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/07/2015
Field of study

The evaluation of an ability or skill happens in some kind of testbed, and so does with social intelligence. Of course, not all testbeds are suitable for this matter. But, how can we be sure of their appropriateness? In this paper we identify the components that should be considered in order to measure social intelligence, and provide some instrumental properties in order to assess the suitability of a testbed.Insa Cabrera, J.; Hernández Orallo, J. (2015). Instrumental Properties of Social Testbeds. Lecture Notes in Artificial Intelligence. 9205:101-110. doi:10.1007/978-3-319-21365-1_11S1011109205Horling, B., Lesser, V.: A Survey of Multi-Agent Organizational Paradigms. The Knowledge Engineering Review 19, 281–316 (2004)Simao, J., Demazeau, Y.: On Social Reasoning in Multi-Agent Systems. Inteligencia Artificial 5(13), 68–84 (2001)Roth, A.E.: The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press (1988)Insa-Cabrera, J., Hernández-Orallo, J.: Definition and properties to assess multi-agent environments as social intelligence tests. Technical report, CoRR (2014)Legg, S., Hutter, M.: Universal Intelligence: A Definition of Machine Intelligence. Minds and Machines 17(4), 391–444 (2007)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J.: A (hopefully) unbiased universal environment class for measuring intelligence of biological and artificial systems. In: 3rd Conference on Artificial General Intelligence, pp. 182–183 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014

Crossref

RiuNet

Evaluating a reinforcement learning algorithm with a general intelligence test

Author: A.M. Turing
C.J.C.H. Watkins
D. Weyns
F. Woergoetter
J. Hernández-Orallo
J. Hernández-Orallo
L.A. Levin
M. Genesereth
R.J. Solomonoff
S. Legg
S. Legg
S. Whiteson
Z. Zatuchna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper we apply the recent notion of anytime universal intelligence tests to the evaluation of a popular reinforcement learning algorithm, Q-learning. We show that a general approach to intelligence evaluation of AI algorithms is feasible. This top-down (theory-derived) approach is based on a generation of environments under a Solomonoff universal distribution instead of using a pre-defined set of specific tasks, such as mazes, problem repositories, etc. This first application of a general intelligence test to a reinforcement learning algorithm brings us to the issue of task-specific vs. general AI agents. This, in turn, suggests new avenues for AI agent evaluation and AI competitions, and also conveys some further insights about the performance of specific algorithms. © 2011 Springer-Verlag.We are grateful for the funding from the Spanish MEC and MICINN for projects TIN2009-06078-E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051.Insa Cabrera, J.; Dowe, DL.; Hernández Orallo, J. (2011). Evaluating a reinforcement learning algorithm with a general intelligence test. En Advances in Artificial Intelligence. Springer Verlag (Germany). 7023:1-11. https://doi.org/10.1007/978-3-642-25274-7_1S1117023Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Genesereth, M., Love, N., Pell, B.: General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62 (2005)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, Atlantis, pp. 182–183 (2010)Hernández-Orallo, J.: On evaluating agent performance in a fixed period of time. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 25–30. Atlantis Press (2010)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. Intl. Joint Conf. on Artificial Intelligence, IJCAI 19, 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Levin, L.A.: Universal sequential search problems. Problems of Information Transmission 9(3), 265–266 (1973)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc. (2008)Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: Proc. 4th ICCS International Conference on Cognitive Science (ICCS 2003), Sydney, Australia, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and Control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: Proc. of the 23rd Intl. Conf. on Machine Learning, ICML 2006, New York, pp. 881–888 (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: Proc. 24th Conf. on Artificial Intelligence (AAAI 2010), pp. 605–611 (2010)Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine learning 8(3), 279–292 (1992)Weyns, D., Parunak, H.V.D., Michel, F., Holvoet, T., Ferber, J.: Environments for multiagent systems state-of-the-art and research challenges. In: Weyns, D., Van Dyke Parunak, H., Michel, F. (eds.) E4MAS 2004. LNCS (LNAI), vol. 3374, pp. 1–47. Springer, Heidelberg (2005)Whiteson, S., Tanner, B., White, A.: The Reinforcement Learning Competitions. The AI magazine 31(2), 81–94 (2010)Woergoetter, F., Porr, B.: Reinforcement learning. Scholarpedia 3(3), 1448 (2008)Zatuchna, Z., Bagnall, A.: Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior 17(1), 28–57 (2009

Crossref

RiuNet

Assessing the Potential of Classical Q-learning in General Game Playing

Author: CB Browne
CJCH Watkins
CP Robert
D Silver
D Silver
H Wang
J Hu
J Méhat
M Genesereth
M Genesereth
M Świechowski
RS Sutton
V Mnih
Publication venue
Publication date: 14/10/2018
Field of study

After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee

\&

Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to allow comparison to Banerjee et al.. We find that Q-learning converges to a high win rate in GGP. For the

\epsilon

-greedy strategy, we propose a first enhancement, the dynamic

\epsilon

algorithm. In addition, inspired by (Gelly

\&

Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

C-tests revisited: back and forth with complexity

Author: B Hibbard
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
MG Bellemare
RJ Solomonoff
S Legg
T Schaul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2015
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21365-1_28We explore the aggregation of tasks by weighting them using a difficulty function that depends on the complexity of the (acceptable) policy for the task (instead of a universal distribution over tasks or an adaptive test). The resulting aggregations and decompositions are (now retrospectively) seen as the natural (and trivial) interactive generalisation of the C-tests.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). C-tests revisited: back and forth with complexity. En Artificial General Intelligence 8th International Conference, AGI 2015, AGI 2015, Berlin, Germany, July 22-25, 2015, Proceedings. Springer International Publishing. 272-282. https://doi.org/10.1007/978-3-319-21365-1_28S272282Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Computational measures of information gain and reinforcement in inference processes. AI Communications 13(1), 49–50 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint arXiv:1408.6908Hernández-Orallo, J.: On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/10.1007/s10458-014-9257-1Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl. Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Hibbard, B.: Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3 edn. Springer-Verlag (2008)Schaul, T.: An extensible description language for video games. IEEE Transactions on Computational Intelligence and AI in Games PP(99), 1–1 (2014)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964

Crossref

RiuNet

Comparing humans and AI agents

Author: A.M. Turing
C.J.C.H. Watkins
D. Gordon
G. Oppy
J. Hernández-Orallo
J. Hernández-Orallo
J. Hernández-Orallo
J. Hernández-Orallo
J. Veness
L. Ahn von
M. Li
R.J. Solomonoff
R.S. Sutton
S. Legg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Comparing humans and machines is one important source of information about both machine and human strengths and limitations. Most of these comparisons and competitions are performed in rather specific tasks such as calculus, speech recognition, translation, games, etc. The information conveyed by these experiments is limited, since it portrays that machines are much better than humans at some domains and worse at others. In fact, CAPTCHAs exploit this fact. However, there have only been a few proposals of general intelligence tests in the last two decades, and, to our knowledge, just a couple of implementations and evaluations. In this paper, we implement one of the most recent test proposals, devise an interface for humans and use it to compare the intelligence of humans and Q-learning, a popular reinforcement learning algorithm. The results are highly informative in many ways, raising many questions on the use of a (universal) distribution of environments, on the role of measuring knowledge acquisition, and other issues, such as speed, duration of the test, scalability, etc.We thank the anonymous reviewers for their helpful comments. We also thank José Antonio Martín H. for helping us with several issues about the RL competition, RL-Glue and reinforcement learning in general. We are also grateful to all the subjects who took the test. We also thank the funding from the Spanish MEC and MICINN for projects TIN2009-06078- E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051Insa Cabrera, J.; Dowe, DL.; España Cubillo, S.; Henánez-Lloreda, MV.; Hernández Orallo, J. (2011). Comparing humans and AI agents. En Artificial General Intelligence. Springer Verlag (Germany). 6830:122-132. https://doi.org/10.1007/978-3-642-22887-2_13S1221326830Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Gordon, D., Subramanian, D.: A cognitive model of learning to navigate. In: Proc. 19th Conf. of the Cognitive Science Society, 1997, vol. 25, p. 271. Lawrence Erlbaum, Mahwah (1997)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 182–183. Atlantis Press, London (2010) Extended report at, http://users.dsic.upv.es/proy/anynt/unbiased.pdfHernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., España-Cubillo, S., Hernández-Lloreda, M.V., Insa-Cabrera, J.: On more realistic environment distributions for defining, evaluating and developing intelligence. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS(LNAI), pp. 81–90. Springer, Heidelberg (2011)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. In: Intl Joint Conf on Artificial Intelligence, IJCAI, vol. 19, p. 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc., Heidelberg (2008)Oppy, G., Dowe, D.L.: The Turing Test. In: Zalta, E.N. (ed.) Stanford Encyclopedia of Philosophy, Stanford University, Stanford (2011), http://plato.stanford.edu/entries/turing-test/Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: 4th Intl. Conf. on Cognitive Science (ICCS 2003), Sydney, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: ICML 2006, pp. 881–888. New York (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press, Cambridge (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: A Monte Carlo AIXI Approximation. Journal of Artificial Intelligence Research, JAIR 40, 95–142 (2011)von Ahn, L., Blum, M., Langford, J.: Telling humans and computers apart automatically. Communications of the ACM 47(2), 56–60 (2004)Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. learning 8(3), 279–292 (1992

CiteSeerX

Crossref

RiuNet

Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement

Author: A Cangelosi
A Newell
A Newell
A Newell
A Newell
AE Elo
AH Eden
AM Turing
AV Melkikh
B Hibbard
B Rohrer
B Settles
BG Buchanan
C Biever
C Dimitrakakis
C Drummond
C Ferri
C Igel
C Strannegård
C Strannegård
CJCH Watkins
CS Wallace
CS Wallace
D Baldwin
D Ferrucci
D Livingstone
D Long
D Proudfoot
D Silver
D Vázquez
DB Fogel
DE Knuth
DH Wolpert
DH Wolpert
DJ Hand
DJ Weiss
DK Detterman
DL Dowe
DL Dowe
DL Dowe
DR White
E Falkenauer
E Herrmann
EA Wasserman
ELJ Leeuwenberg
F Amigoni
FM Lord
G Marcus
G Sutcliffe
G Sutcliffe
GJ Chaitin
GJ Mellenbergh
H Aziz
HA Simon
HJ Levesque
I Arel
IJ Deary
J Alcalá
J Anderson
J Anderson
J Demšar
J Feldman
J Gaschnig
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Insa-Cabrera
J Insa-Cabrera
J Insa-Cabrera
J Krueger
J Schaeffer
J Vanschoren
J Vanschoren
J Weizenbaum
J You
JB Carroll
JH Kim
José Hernández-Orallo
JR Geissman
JR Koza
JR Searle
JRM Alexander
K Kleiner
KS Decker
L Ahn von
L Ahn von
L Morgenstern
L Torrey
LA Levin
LA Levin
LA Levin
LG Valiant
M Asada
M Buhrmester
M Campbell
M Genesereth
M Hutter
M Li
M Vallati
M Wellman
M Winikoff
MG Bellemare
N Bostrom
N Chater
N Chater
N Japkowicz
N Macià
O Goldreich
P Brazdil
P Hingston
P Langley
P Langley
P Langley
P McCorduck
P Schweizer
PJ Ferrando
PJ Ferrando
PR Cohen
R Caruana
R Madhavan
R Srinivasan
R Yonck
RJ Solomonoff
RV Yampolskiy
S Adams
S Bringsjord
S Gulwani
S Legg
S Rajani
S Russell
S Thrun
S Whiteson
SA Vere
SE Embretson
SJ Pan
SJ Shettleworth
SJ Shettleworth
SM Shieber
SS Adams
ST Mueller
TK Ho
TZ Keith
V Mnih
W Edmondson
W Ketter
W Khreich
WG Cochran
Z Zatuchna
ZH Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2016
Field of study

The final publication is available at Springer via http://dx.doi.org/ 10.1007/s10462-016-9505-7.The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.I thank the organisers of the AEPIA Summer School On Artificial Intelligence, held in September 2014, for giving me the opportunity to give a lecture on 'AI Evaluation'. This paper was born out of and evolved through that lecture. The information about many benchmarks and competitions discussed in this paper have been contrasted with information from and discussions with many people: M. Bedia, A. Cangelosi, C. Dimitrakakis, I. GarcIa-Varea, Katja Hofmann, W. Langdon, E. Messina, S. Mueller, M. Siebers and C. Soares. Figure 4 is courtesy of F. Martinez-Plumed. Finally, I thank the anonymous reviewers, whose comments have helped to significantly improve the balance and coverage of the paper. This work has been partially supported by the EU (FEDER) and the Spanish MINECO under Grants TIN 2013-45732-C4-1-P, TIN 2015-69175-C4-1-R and by Generalitat Valenciana PROMETEOII2015/013.José Hernández-Orallo (2016). Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement. Artificial Intelligence Review. 1-51. https://doi.org/10.1007/s10462-016-9505-7S151Abel D, Agarwal A, Diaz F, Krishnamurthy A, Schapire RE (2016) Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119Adams S, Arel I, Bach J, Coop R, Furlan R, Goertzel B, Hall JS, Samsonovich A, Scheutz M, Schlesinger M, Shapiro SC, Sowa J (2012) Mapping the landscape of human-level artificial general intelligence. AI Mag 33(1):25–42Adams SS, Banavar G, Campbell M (2016) I-athlon: towards a multi-dimensional Turing test. AI Mag 37(1):78–84Alcalá J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2010) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17:255–287Alexander JRM, Smales S (1997) Intelligence, learning and long-term memory. Personal Individ Differ 23(5):815–825Alpcan T, Everitt T, Hutter M (2014) Can we measure the difficulty of an optimization problem? In: IEEE information theory workshop (ITW)Alur R, Bodik R, Juniwal G, Martin MMK, Raghothaman M, Seshia SA, Singh R, Solar-Lezama A, Torlak E, Udupa A (2013) Syntax-guided synthesis. In: Formal methods in computer-aided design (FMCAD), 2013, IEEE, pp 1–17Alvarado N, Adams SS, Burbeck S, Latta C (2002) Beyond the Turing test: performance metrics for evaluating a computer simulation of the human mind. In: Proceedings of the 2nd international conference on development and learning, IEEE, pp 147–152Amigoni F, Bastianelli E, Berghofer J, Bonarini A, Fontana G, Hochgeschwender N, Iocchi L, Kraetzschmar G, Lima P, Matteucci M, Miraldo P, Nardi D, Schiaffonati V (2015) Competitions for benchmarking: task and functionality scoring complete performance assessment. IEEE Robot Autom Mag 22(3):53–61Anderson J, Lebiere C (2003) The Newell test for a theory of cognition. Behav Brain Sci 26(5):587–601Anderson J, Baltes J, Cheng CT (2011) Robotics competitions as benchmarks for AI research. Knowl Eng Rev 26(01):11–17Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18Asada M, Hosoda K, Kuniyoshi Y, Ishiguro H, Inui T, Yoshikawa Y, Ogino M, Yoshida C (2009) Cognitive developmental robotics: a survey. IEEE Trans Auton Ment Dev 1(1):12–34Aziz H, Brill M, Fischer F, Harrenstein P, Lang J, Seedig HG (2015) Possible and necessary winners of partial tournaments. J Artif Intell Res 54:493–534Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/mlBagnall AJ, Zatuchna ZV (2005) On the classification of maze problems. In: Bull L, Kovacs T (eds) Foundations of learning classifier system. Studies in fuzziness and soft computing, vol. 183, Springer, pp 305–316. http://rd.springer.com/chapter/10.1007/11319122_12Baldwin D, Yadav SB (1995) The process of research investigations in artificial intelligence - a unified view. IEEE Trans Syst Man Cybern 25(5):852–861Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279Besold TR (2014) A note on chances and limitations of psychometric ai. In: KI 2014: advances in artificial intelligence. Springer, pp 49–54Biever C (2011) Ultimate IQ: one test to rule them all. New Sci 211(2829, 10 September 2011):42–45Borg M, Johansen SS, Thomsen DL, Kraus M (2012) Practical implementation of a graphics Turing test. In: Advances in visual computing. Springer, pp 305–313Boring EG (1923) Intelligence as the tests test it. New Repub 35–37Bostrom N (2014) Superintelligence: paths, dangers, strategies. Oxford University Press, OxfordBrazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer, New YorkBringsjord S (2011) Psychometric artificial intelligence. J Exp Theor Artif Intell 23(3):271–277Bringsjord S, Schimanski B (2003) What is artificial intelligence? Psychometric AI as an answer. In: International joint conference on artificial intelligence, pp 887–893Brundage M (2016) Modeling progress in ai. AAAI 2016 Workshop on AI, Ethics, and SocietyBuchanan BG (1988) Artificial intelligence as an experimental science. Springer, New YorkBuhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect Psychol Sci 6(1):3–5Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: generic solving of text-based captchas. In: Proceedings of the 8th USENIX conference on Offensive Technologies, USENIX Association, p 3Campbell M, Hoane AJ, Hsu F (2002) Deep Blue. Artif Intell 134(1–2):57–83Cangelosi A, Schlesinger M, Smith LB (2015) Developmental robotics: from babies to robots. MIT Press, CambridgeCaputo B, Müller H, Martinez-Gomez J, Villegas M, Acar B, Patricia N, Marvasti N, Üsküdarlı S, Paredes R, Cazorla M et al (2014) Imageclef 2014: overview and analysis of the results. In: Information access evaluation. Multilinguality, multimodality, and interaction, Springer, pp 192–211Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka ER Jr, Mitchell TM (2010) Toward an architecture for never-ending language learning. In: AAAI, vol 5, p 3Carroll JB (1993) Human cognitive abilities: a survey of factor-analytic studies. Cambridge University Press, CambridgeCaruana R (1997) Multitask learning. Mach Learn 28(1):41–75Chaitin GJ (1982) Gödel’s theorem and information. Int J Theor Phys 21(12):941–954Chandrasekaran B (1990) What kind of information processing is intelligence? In: The foundation of artificial intelligence—a sourcebook. Cambridge University Press, pp 14–46Chater N (1999) The search for simplicity: a fundamental cognitive principle? Q J Exp Psychol Sect A 52(2):273–302Chater N, Vitányi P (2003) Simplicity: a unifying principle in cognitive science? Trends Cogn Sci 7(1):19–22Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th annual computer security applications conference, ACM, pp 21–30Cochran WG (2007) Sampling techniques. Wiley, New YorkCohen PR, Howe AE (1988) How evaluation guides AI research: the message still counts more than the medium. AI Mag 9(4):35Cohen Y (2013) Testing and cognitive enhancement. Technical repor, National Institute for Testing and Evaluation, Jerusalem, IsraelConrad JG, Zeleznikow J (2013) The significance of evaluation in AI and law: a case study re-examining ICAIL proceedings. In: Proceedings of the 14th international conference on artificial intelligence and law, ACM, pp 186–191Conrad JG, Zeleznikow J (2015) The role of evaluation in ai and law. In: Proceedings of the 15th international conference on artificial intelligence and law, pp 181–186Deary IJ, Der G, Ford G (2001) Reaction times and intelligence differences: a population-based cohort study. Intelligence 29(5):389–399Decker KS, Durfee EH, Lesser VR (1989) Evaluating research in cooperative distributed problem solving. Distrib Artif Intell 2:487–519Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Detterman DK (2011) A challenge to Watson. Intelligence 39(2–3):77–78Dimitrakakis C (2016) Personal communicationDimitrakakis C, Li G, Tziortziotis N (2014) The reinforcement learning competition 2014. AI Mag 35(3):61–65Dowe DL (2013) Introduction to Ray Solomonoff 85th memorial conference. In: Dowe DL (ed) Algorithmic probability and friends. Bayesian prediction and artificial intelligence, lecture notes in computer science, vol 7070. Springer, Berlin, pp 1–36Dowe DL, Hajek AR (1997) A computational extension to the Turing Test. In: Proceedings of the 4th conference of the Australasian cognitive science society, University of Newcastle, NSW, AustraliaDowe DL, Hajek AR (1998) A non-behavioural, computational extension to the Turing test. In: International conference on computational intelligence and multimedia applications (ICCIMA’98), Gippsland, Australia, pp 101–106Dowe DL, Hernández-Orallo J (2012) IQ tests are not for machines, yet. Intelligence 40(2):77–81Dowe DL, Hernández-Orallo J (2014) How universal can an intelligence test be? Adapt Behav 22(1):51–69Drummond C (2009) Replicability is not reproducibility: nor is it good science. In: Proceedings of the evaluation methods for machine learning workshop at the 26th ICML, Montreal, CanadaDrummond C, Japkowicz N (2010) Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. J Exp Theor Artif Intell 22(1):67–80Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. arXiv preprint arXiv:1604.06778Eden AH, Moor JH, Soraker JH, Steinhart E (2013) Singularity hypotheses: a scientific and philosophical assessment. Springer, New YorkEdmondson W (2012) The intelligence in ETI—what can we know? Acta Astronaut 78:37–42Elo AE (1978) The rating of chessplayers, past and present, vol 3. Batsford, LondonEmbretson SE, Reise SP (2000) Item response theory for psychologists. L. Erlbaum, HillsdaleEvans JM, Messina ER (2001) Performance metrics for intelligent systems. NIST Special Publication SP, pp 101–104Everitt T, Lattimore T, Hutter M (2014) Free lunch for optimisation under the universal distribution. In: 2014 IEEE Congress on evolutionary computation (CEC), IEEE, pp 167–174Falkenauer E (1998) On method overfitting. J Heuristics 4(3):281–287Feldman J (2003) Simplicity and complexity in human concept learning. Gen Psychol 38(1):9–15Ferrando PJ (2009) Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Appl Psychol Meas 33(1):9–24Ferrando PJ (2012) Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica 33:111–139Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock J, Nyberg E, Prager J et al (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79Fogel DB (1991) The evolution of intelligent decision making in gaming. Cybern Syst 22(2):223–236Gaschnig J, Klahr P, Pople H, Shortliffe E, Terry A (1983) Evaluation of expert systems: issues and case studies. Build Exp Syst 1:241–278Geissman JR, Schultz RD (1988) Verification & validation. AI Exp 3(2):26–33Genesereth M, Love N, Pell B (2005) General game playing: overview of the AAAI competition. AI Mag 26(2):62Gerónimo D, López AM (2014) Datasets and benchmarking. In: Vision-based pedestrian protection systems for intelligent vehicles. Springer, pp 87–93Goertzel B, Pennachin C (eds) (2007) Artificial general intelligence. Springer, New YorkGoertzel B, Arel I, Scheutz M (2009) Toward a roadmap for human-level artificial general intelligence: embedding HLAI systems in broad, approachable, physical or virtual contexts. Artif Gen Intell Roadmap InitiatGoldreich O, Vadhan S (2007) Special issue on worst-case versus average-case complexity editors’ foreword. Comput complex 16(4):325–330Gordon BB (2007) Report on panel discussion on (re-)establishing or increasing collaborative links between artificial intelligence and intelligent systems. In: Messina ER, Madhavan R (eds) Proceedings of the 2007 workshop on performance metrics for intelligent systems, pp 302–303Gulwani S, Hernández-Orallo J, Kitzelmann E, Muggleton SH, Schmid U, Zorn B (2015) Inductive programming meets the real world. Commun ACM 58(11):90–99Hand DJ (2004) Measurement theory and practice. A Hodder Arnold Publication, LondonHernández-Orallo J (2000a) Beyond the Turing test. J Logic Lang Inf 9(4):447–466Hernández-Orallo J (2000b) On the computational measurement of intelligence factors. In: Meystel A (ed) Performance metrics for intelligent systems workshop. National Institute of Standards and Technology, Gaithersburg, pp 1–8Hernández-Orallo J (2000c) Thesis: computational measures of information gain and reinforcement in inference processes. AI Commun 13(1):49–50Hernández-Orallo J (2010) A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Artificial general intelligence, 3rd International Conference. Atlantis Press, Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf , pp 182–183Hernández-Orallo J (2014) On environment difficulty and discriminating power. Auton Agents Multi-Agent Syst. 29(3):402–454. doi: 10.1007/s10458-014-9257-1Hernández-Orallo J, Dowe DL (2010) Measuring universal intelligence: towards an anytime intelligence test. Artif Intell 174(18):1508–1539Hernández-Orallo J, Dowe DL (2013) On potential cognitive abilities in the machine kingdom. Minds Mach 23:179–210Hernández-Orallo J, Minaya-Collado N (1998) A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proceedings of international symposium of engineering of intelligent systems (EIS’98), ICSC Press, pp 146–163Hernández-Orallo J, Dowe DL, España-Cubillo S, Hernández-Lloreda MV, Insa-Cabrera J (2011) On more realistic environment distributions for defining, evaluating and developing intelligence. In: Schmidhuber J, Thórisson K, Looks M (eds) Artificial general intelligence, LNAI, vol 6830. Springer, New York, pp 82–91Hernández-Orallo J, Flach P, Ferri C (2012a) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869Hernández-Orallo J, Insa-Cabrera J, Dowe DL, Hibbard B (2012b) Turing Tests with Turing machines. In: Voronkov A (ed) Turing-100, EPiC Series, vol 10, pp 140–156Hernández-Orallo J, Dowe DL, Hernández-Lloreda MV (2014) Universal psychometrics: measuring cognitive abilities in the machine kingdom. Cogn Syst Res 27:50–74Hernández-Orallo J, Martínez-Plumed F, Schmid U, Siebers M, Dowe DL (2016) Computer models solving intelligence test problems: progress and implications. Artif Intell 230:74–107Herrmann E, Call J, Hernández-Lloreda MV, Hare B, Tomasello M (2007) Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science 317(5843):1360–1366Hibbard B (2009) Bias and no free lunch in formal measures of intelligence. J Artif Gen Intell 1(1):54–61Hingston P (2010) A new design for a Turing Test for bots. In: 2010 IEEE symposium on computational intelligence and games (CIG), IEEE, pp 345–350Hingston P (2012) Believable bots: can computers play like people?. Springer, New YorkHo TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300Hutter M (2007) Universal algorithmic intelligence: a mathematical top

\rightarrow

→ down approach. In: Goertzel B, Pennachin C (eds) Artificial general intelligence, cognitive technologies. Springer, Berlin, pp 227–290Igel C, Toussaint M (2005) A no-free-lunch theorem for non-uniform distributions of target functions. J Math Model Algorithms 3(4):313–322Insa-Cabrera J (2016) Towards a universal test of social intelligence. Ph.D. thesis, Departament de Sistemes Informátics i Computació, UPVInsa-Cabrera J, Dowe DL, España-Cubillo S, Hernández-Lloreda MV, Hernández-Orallo J (2011a) Comparing humans and ai agents. In: Schmidhuber J, Thórisson K, Looks M (eds) Artificial general intelligence, LNAI, vol 6830. Springer, New York, pp 122–132Insa-Cabrera J, Dowe DL, Hernández-Orallo J (2011) Evaluating a reinforcement learning algorithm with a general intelligence test. In: Lozano JA, Gamez JM (eds) Current topics in artificial intelligence. CAEPIA 2011, LNAI series 7023. Springer, New YorkInsa-Cabrera J, Benacloch-Ayuso JL, Hernández-Orallo J (2012) On measuring social intelligence: experiments on competition and cooperation. In: Bach J, Goertzel B, Iklé M (eds) AGI, lecture notes in computer science, vol 7716. Springer, New York, pp 126–135Jacoff A, Messina E, Weiss BA, Tadokoro S, Nakagawa Y (2003) Test arenas and performance metrics for urban search and rescue robots. In: Proceedings of 2003 IEEE/RSJ international conference on intelligent robots and systems, 2003 (IROS 2003), IEEE, vol 4, pp 3396–3403Japkowicz N, Shah M (2011) Evaluating learning algorithms. Cambridge University Press, CambridgeJiang J (2008) A literature survey on domain adaptation of statistical classifiers. http://sifaka.cs.uiuc.edu/jiang4/domain_adaptation/surveyJohnson M, Hofmann K, Hutton T, Bignell D (2016) The Malmo platform for artificial intelligence experimentation. In: International joint conference on artificial intelligence (IJCAI)Keith TZ, Reynolds MR (2010) Cattell–Horn–Carroll abilities and cognitive tests: what we’ve learned from 20 years of research. Psychol Schools 47(7):635–650Ketter W, Symeonidis A (2012) Competitive benchmarking: lessons learned from the trading agent competition. AI Mag 33(2):103Khreich W, Granger E, Miri A, Sabourin R (2012) A survey of techniques for incremental learning of HMM parameters. Inf Sci 197:105–130Kim JH (2004) Soccer robotics, vol 11. Springer, New YorkKitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E (1997) Robocup: the robot world cup initiative. In: Proceedings of the first international conference on autonomous agents, ACM, pp 340–347Kleiner K (2011) Who are you calling bird-brained? An attempt is being made to devise a universal intelligence test. Economist 398(8723, 5 March 2011):82Knuth DE (1973) Sorting and searching, volume 3 of the art of computer programming. Addison-Wesley, ReadingKoza JR (2010) Human-competitive results produced by genetic programming. Genet Program Evolvable Mach 11(3–4):251–284Krueger J, Osherson D (1980) On the psychology of structural simplicity. In: Jusczyk PW, Klein RM (eds) The nature of thought: essays in honor of D. O. Hebb. Psychology Press, London, pp 187–205Langford J (2005) Clever methods of overfitting. Machine Learning (Theory). http://hunch.netLangley P (1987) Research papers in machine learning. Mach Learn 2(3):195–198Langley P (2011) The changing science of machine learning. Mach Learn 82(3):275–279Langley P (2012) The cognitive systems paradigm. Adv Cogn Syst 1:3–13Lattimore T, Hutter M (2013) No free lunch versus Occam’s razor in supervised learning. Algorithmic Probability and Friends. Springer, Bayesian Prediction and Artificial Intelligence, pp 223–235Leeuwenberg ELJ, Van Der Helm PA (2012) Structural information theory: the simplicity of visual form. Cambridge University Press, CambridgeLegg S, Hutter M (2007a) Tests of machine intelligence. In: Lungarella M, Iida F, Bongard J, Pfeifer R (eds) 50 Years of Artificial Intelligence, Lecture Notes in Computer Science, vol 4850, Springer Berlin Heidelberg, pp 232–242. doi: 10.1007/978-3-540-77296-5_22Legg S, Hutter M (2007b) Universal intelligence: a definition of machine intelligence. Minds Mach 17(4):391–444Legg S, Veness J (2013) An approximation of the universal intelligence measure. Algorithmic Probability and Friends. Springer, Bayesian Prediction and Artificial Intelligence, pp 236–249Levesque HJ (2014) On our best behaviour. Artif Intell 212:27–35Levesque HJ, Davis E, Morgenstern L (2012) The winog

Crossref

RiuNet

On potential cognitive abilities in the machine kingdom

Author: Dowe David L.
Hernández-Orallo José
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s11023-012-9299-6Animals, including humans, are usually judged on what they could become, rather than what they are. Many physical and cognitive abilities in the ‘animal kingdom’ are only acquired (to a given degree) when the subject reaches a certain stage of development, which can be accelerated or spoilt depending on how the environment, training or education is. The term ‘potential ability’ usually refers to how quick and likely the process of attaining the ability is. In principle, things should not be different for the ‘machine kingdom’. While machines can be characterised by a set of cognitive abilities, and measuring them is already a big challenge, known as ‘universal psychometrics’, a more informative, and yet more challenging, goal would be to also determine the potential cognitive abilities of a machine. In this paper we investigate the notion of potential cognitive ability for machines, focussing especially on universality and intelligence. We consider several machine characterisations (non-interactive and interactive) and give definitions for each case, considering permanent and temporal potentials. From these definitions, we analyse the relation between some potential abilities, we bring out the dependency on the environment distribution and we suggest some ideas about how potential abilities can be measured. Finally, we also analyse the potential of environments at different levels and briefly discuss whether machines should be designed to be intelligent or potentially intelligent.We thank the anonymous reviewers for their comments, which have helped to significantly improve this paper. This work was supported by the MEC-MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT. Finally, we thank three pioneers ahead of their time(s). We thank Ray Solomonoff (1926-2009) and Chris Wallace (1933-2004) for all that they taught us, directly and indirectly. And, in his centenary year, we thank Alan Turing (1912-1954), with whom it perhaps all began.Hernández-Orallo, J.; Dowe, DL. (2013). On potential cognitive abilities in the machine kingdom. Minds and Machines. 23(2):179-210. https://doi.org/10.1007/s11023-012-9299-6S179210232Amari, S., Fujita, N., Shinomoto, S. (1992). Four types of learning curves. Neural Computation 4(4), 605–618.Aristotle (Translation, Introduction, and Commentary by Ross, W.D.) (1924). Aristotle’s Metaphysics. Oxford: Clarendon Press.Barmpalias, G. & Dowe, D. L. (2012). Universality probability of a prefix-free machine. Philosophical transactions of the Royal Society A [Mathematical, Physical and Engineering Sciences] (Phil Trans A), Theme Issue ‘The foundations of computation, physics and mentality: The Turing legacy’ compiled and edited by Barry Cooper and Samson Abramsky, 370, pp 3488–3511.Chaitin, G. J. (1966). On the length of programs for computing finite sequences. Journal of the Association for Computing Machinery, 13, 547–569.Chaitin, G. J. (1975). A theory of program size formally identical to information theory. Journal of the ACM (JACM), 22(3), 329–340.Dowe, D. L. (2008, September). Foreword re C. S. Wallace. Computer Journal, 51(5):523–560, Christopher Stewart WALLACE (1933–2004) memorial special issue.Dowe, D. L. (2011). MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: P. S. Bandyopadhyay, M. R. Forster (Eds), Handbook of the philosophy of science—Volume 7: Philosophy of statistics (pp. 901–982). Amsterdam: Elsevier.Dowe, D. L. & Hajek, A. R. (1997a). A computational extension to the turing test. Technical report #97/322, Dept Computer Science, Monash University, Melbourne, Australia, 9 pp, http://www.csse.monash.edu.au/publications/1997/tr-cs97-322-abs.html .Dowe, D. L. & Hajek, A. R. (1997b, September). A computational extension to the Turing Test. in Proceedings of the 4th conference of the Australasian Cognitive Science Society, University of Newcastle, NSW, Australia, 9 pp.Dowe, D. L. & Hajek, A. R. (1998, February). A non-behavioural, computational extension to the Turing Test. In: International conference on computational intelligence and multimedia applications (ICCIMA’98), Gippsland, Australia, pp 101–106.Dowe, D. L., Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learning curve: Implications of a quantitative analysis. In Proceedings of the National Academy of Sciences of the United States of America, 101(36), 13124–13131.Gardner, M. (1970). Mathematical games: The fantastic combinations of John Conway’s new solitaire game “life”. Scientific American, 223(4), 120–123.Goertzel, B. & Bugaj, S. V. (2009). AGI preschool: A framework for evaluating early-stage human-like AGIs. In Proceedings of the second international conference on artificial general intelligence (AGI-09), pp 31–36.Hernández-Orallo, J. (2000a). Beyond the Turing Test. Journal of Logic, Language & Information, 9(4), 447–466.Hernández-Orallo, J. (2000b). On the computational measurement of intelligence factors. In A. Meystel (Ed), Performance metrics for intelligent systems workshop (pp 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Hernández-Orallo, J. (2010). On evaluating agent performance in a fixed period of time. In M. Hutter et al. (Eds.), Proceedings of 3rd international conference on artificial general intelligence (pp. 25–30). New York: Atlantis Press.Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.Hernández-Orallo, J. & Dowe, D. L. (2011, April). Mammals, machines and mind games. Who’s the smartest?. The conversation, http://theconversation.edu.au/mammals-machines-and-mind-games-whos-the-smartest-566 .Hernández-Orallo J., Dowe D. L., España-Cubillo S., Hernández-Lloreda M. V., & Insa-Cabrera J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In: J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), Artificial general intelligence 2011, volume 6830, LNAI series, pp. 82–91. New York: Springer.Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2012a, March). Measuring cognitive abilities of machines, humans and non-human animals in a unified way: towards universal psychometrics. Technical report 2012/267, Faculty of Information Technology, Clayton School of I.T., Monash University, Australia.Hernández-Orallo, J., Insa, J., Dowe, D. L., & Hibbard, B. (2012b). Turing tests with Turing machines. In A. Voronkov (Ed.), The Alan Turing centenary conference, Turing-100, Manchester, volume 10 of EPiC Series, pp 140–156.Hernández-Orallo, J., & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of the international symposium of engineering of intelligent systems (EIS’98) (pp 146–163). Switzerland: ICSC Press.Herrmann, E., Call, J., Hernández-Lloreda, M. V., Hare, B., & Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis. Science, 317(5843), 1360–1366.Herrmann, E., Hernández-Lloreda, M. V., Call, J., Hare, B., & Tomasello, M. (2010). The structure of individual differences in the cognitive abilities of children and chimpanzees. Psychological Science, 21(1), 102–110.Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized general intelligences. Journal of educational psychology, 57(5), 253.Hutter, M. (2005). Universal artificial intelligence: Sequential decisions based on algorithmic probability. New York: Springer.Insa-Cabrera, J., Dowe, D. L., España, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011a). Comparing humans and AI agents. In AGI: 4th conference on artificial general intelligence—Lecture Notes in Artificial Intelligence (LNAI), volume 6830, pp 122–132. Springer, New York.Insa-Cabrera, J., Dowe, D. L., & Hernández-Orallo, J. (2011b). Evaluating a reinforcement learning algorithm with a general intelligence test. In CAEPIA—Lecture Notes in Artificial Intelligence (LNAI), volume 7023, pages 1–11. Springer, New York.Kearns, M. & Singh, S. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2), 209–232.Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1, 4–7.Legg, S. (2008, June). Machine super intelligence. Department of Informatics, University of Lugano.Legg, S. & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.Legg, S., & Veness, J. (2012). An approximation of the universal intelligence measure. In Proceedings of Solomonoff 85th memorial conference. New York: Springer.Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Li, M., Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed). New York: Springer.Little, V. L., & Bailey, K. G. (1972). Potential intelligence or intelligence test potential? A question of empirical validity. Journal of Consulting and Clinical Psychology, 39(1), 168.Mahoney, M. V. (1999). Text compression as a test for artificial intelligence. In Proceedings of the national conference on artificial intelligence, AAAI (pp. 486–502). New Jersey: Wiley.Mahrer, A. R. (1958). Potential intelligence: A learning theory approach to description and clinical implication. The Journal of General Psychology, 59(1), 59–71.Oppy, G., & Dowe, D. L. (2011). The Turing Test. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy. Stanford University. http://plato.stanford.edu/entries/turing-test/ .Orseau, L. & Ring, M. (2011). Self-modification and mortality in artificial agents. In AGI: 4th conference on artificial general intelligence—Lecture Notes in Artificial Intelligence (LNAI), volume 6830, pages 1–10. Springer, New York.Ring, M. & Orseau, L. (2011). Delusion, survival, and intelligent agents. In AGI: 4th conference on artificial general intelligence—Lecture Notes in Artificial Intelligence (LNAI), volume 6830, pp. 11–20. Springer, New York.Schaeffer, J., Burch, N., Bjornsson, Y., Kishimoto, A., Muller, M., Lake, R., et al. (2007). Checkers is solved. Science, 317(5844), 1518.Solomonoff, R. J. (1962). Training sequences for mechanized induction. In M. Yovits, G. Jacobi, & G. Goldsteins (Eds.), Self-Organizing Systems, 7, 425–434.Solomonoff, R. J. (1964). A formal theory of inductive inference. Information and Control, 7(1–22), 224–254.Solomonoff, R. J. (1967). Inductive inference research: Status, Spring 1967. RTB 154, Rockford Research, Inc., 140 1/2 Mt. Auburn St., Cambridge, Mass. 02138, July 1967.Solomonoff, R. J. (1978). Complexity-based induction systems: comparisons and convergence theorems. IEEE Transactions on Information Theory, 24(4), 422–432.Solomonoff, R. J. (1984). Perfect training sequences and the costs of corruption—A progress report on induction inference research. Oxbridge research.Solomonoff, R. J. (1985). The time scale of artificial intelligence: Reflections on social effects. Human Systems Management, 5, 149–153.Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: The MIT press.Thorp, T. R., & Mahrer, A. R. (1959). Predicting potential intelligence. Journal of Clinical Psychology, 15(3), 286–288.Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460.Veness, J., Ng, K. S., Hutter, M., & Silver, D. (2011). A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research, JAIR, 40, 95–142.Wallace, C. S. (2005). Statistical and inductive inference by minimum message length. New York: Springer.Wallace, C. S., & Boulton, D. M. (1968). An information measure for classification. Computer Journal, 11, 185–194.Wallace, C. S., & Dowe, D. L. (1999a). Minimum message length and Kolmogorov complexity. Computer Journal 42(4), 270–283.Wallace, C. S., & Dowe, D. L. (1999b). Refinements of MDL and MML coding. Computer Journal, 42(4), 330–337.Woergoetter, F., & Porr, B. (2008). Reinforcement learning. Scholarpedia, 3(3), 1448.Zvonkin, A. K., & Levin, L. A. (1970). The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 25, 83–124

CiteSeerX

RiuNet

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Author: He Kaiming
Ren Shaoqing
Sun Jian
Zhang Xiangyu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102x faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.Comment: This manuscript is the accepted version for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2015. See Changelo

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hong Kong University of Science and Technology Institutional Repository

The Application of Artificial Intelligence to Solve a Physical Puzzle

Author: Cremer Dana
Publication venue: Indiana University South Bend
Publication date: 01/06/2007
Field of study

Thesis (M.A.) -- Indiana University South Bend, 2007.This thesis presents the design, development, and implementation of an intelligent agent capable of solving a physical puzzle. The puzzle is a three dimensional maze in which a marble must be moved from its starting point to a target cell in the opposite corner. The movement of the marble is strictly the result of movement of the maze itself, the marble's response to gravity, and collisions with the walls of the maze. The physical nature of the puzzle provides an interesting challenge for the intelligent agent attempting to solve it, since it does not have complete control over the effects of its actions, and is not able to predict with certainty what those effects will be. A software framework is developed to integrate the artificial intelligence, physics simulation, and computer graphics required to solve the puzzle. A control scheme is designed to enable the agent to perform the physical moves to be simulated. Several solution algorithms are developed and implemented, incorporating varying levels of knowledge of the maze's geometry and the physics involved. In general, it is shown that by increasing the 'intelligence' of the agent, the performance was significantly improved. This thesis is a unique integration of artificial intelligence, physics simulation, and computer graphics. The result is the graphical animation of the solution to a physical puzzle that could not be solved without each of the three technologies.Indiana University South Bend Department of Computer and Information Sciences and the Department of Mathematical Science

IUScholarWorks (University of Indiana)