106 research outputs found
Stochastic tasks: difficulty and Levin search
We establish a setting for asynchronous stochastic tasks that
account for episodes, rewards and responses, and, most especially, the
computational complexity of the algorithm behind an agent solving a
task. This is used to determine the difficulty of a task as the (logarithm
of the) number of computational steps required to acquire an acceptable
policy for the task, which includes the exploration of policies and their
verification. We also analyse instance difficulty, task compositions and
decompositions.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). Stochastic tasks: difficulty and Levin search. En Artificial General Intelligence. Springer International Publishing. 90-100. http://hdl.handle.net/10251/66686S9010
C-tests revisited: back and forth with complexity
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21365-1_28We explore the aggregation of tasks by weighting them using a difficulty
function that depends on the complexity of the (acceptable) policy for the task (instead
of a universal distribution over tasks or an adaptive test). The resulting aggregations
and decompositions are (now retrospectively) seen as the natural (and trivial) interactive
generalisation of the C-tests.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). C-tests revisited: back and forth with complexity. En Artificial General Intelligence 8th International Conference, AGI 2015, AGI 2015, Berlin, Germany, July 22-25, 2015, Proceedings. Springer International Publishing. 272-282. https://doi.org/10.1007/978-3-319-21365-1_28S272282Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Computational measures of information gain and reinforcement in inference processes. AI Communications 13(1), 49–50 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint arXiv:1408.6908Hernández-Orallo, J.: On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/10.1007/s10458-014-9257-1Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl. Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Hibbard, B.: Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3 edn. Springer-Verlag (2008)Schaul, T.: An extensible description language for video games. IEEE Transactions on Computational Intelligence and AI in Games PP(99), 1–1 (2014)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964
Instrumental Properties of Social Testbeds
The evaluation of an ability or skill happens in some kind of testbed, and so does with social intelligence. Of course, not all testbeds are suitable for this matter. But, how can we be sure of their appropriateness? In this paper we identify the components that should be considered in order to measure social intelligence, and provide some instrumental properties in order to assess the suitability of a testbed.Insa Cabrera, J.; Hernández Orallo, J. (2015). Instrumental Properties of Social Testbeds. Lecture Notes in Artificial Intelligence. 9205:101-110. doi:10.1007/978-3-319-21365-1_11S1011109205Horling, B., Lesser, V.: A Survey of Multi-Agent Organizational Paradigms. The Knowledge Engineering Review 19, 281–316 (2004)Simao, J., Demazeau, Y.: On Social Reasoning in Multi-Agent Systems. Inteligencia Artificial 5(13), 68–84 (2001)Roth, A.E.: The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press (1988)Insa-Cabrera, J., Hernández-Orallo, J.: Definition and properties to assess multi-agent environments as social intelligence tests. Technical report, CoRR (2014)Legg, S., Hutter, M.: Universal Intelligence: A Definition of Machine Intelligence. Minds and Machines 17(4), 391–444 (2007)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J.: A (hopefully) unbiased universal environment class for measuring intelligence of biological and artificial systems. In: 3rd Conference on Artificial General Intelligence, pp. 182–183 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014
Comparing humans and AI agents
Comparing humans and machines is one important source of
information about both machine and human strengths and limitations.
Most of these comparisons and competitions are performed in rather
specific tasks such as calculus, speech recognition, translation, games,
etc. The information conveyed by these experiments is limited, since it
portrays that machines are much better than humans at some domains
and worse at others. In fact, CAPTCHAs exploit this fact. However,
there have only been a few proposals of general intelligence tests in the
last two decades, and, to our knowledge, just a couple of implementations
and evaluations. In this paper, we implement one of the most recent test
proposals, devise an interface for humans and use it to compare the
intelligence of humans and Q-learning, a popular reinforcement learning
algorithm. The results are highly informative in many ways, raising many
questions on the use of a (universal) distribution of environments, on the
role of measuring knowledge acquisition, and other issues, such as speed,
duration of the test, scalability, etc.We thank the anonymous reviewers for their helpful
comments. We also thank JosĂ© Antonio MartĂn H. for helping us with several
issues about the RL competition, RL-Glue and reinforcement learning in general. We are also grateful to all the subjects who took the test. We also thank
the funding from the Spanish MEC and MICINN for projects TIN2009-06078-
E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC
FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051Insa Cabrera, J.; Dowe, DL.; España Cubillo, S.; Henánez-Lloreda, MV.; Hernández Orallo, J. (2011). Comparing humans and AI agents. En Artificial General Intelligence. Springer Verlag (Germany). 6830:122-132. https://doi.org/10.1007/978-3-642-22887-2_13S1221326830Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Gordon, D., Subramanian, D.: A cognitive model of learning to navigate. In: Proc. 19th Conf. of the Cognitive Science Society, 1997, vol. 25, p. 271. Lawrence Erlbaum, Mahwah (1997)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 182–183. Atlantis Press, London (2010) Extended report at, http://users.dsic.upv.es/proy/anynt/unbiased.pdfHernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., España-Cubillo, S., Hernández-Lloreda, M.V., Insa-Cabrera, J.: On more realistic environment distributions for defining, evaluating and developing intelligence. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS(LNAI), pp. 81–90. Springer, Heidelberg (2011)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. In: Intl Joint Conf on Artificial Intelligence, IJCAI, vol. 19, p. 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc., Heidelberg (2008)Oppy, G., Dowe, D.L.: The Turing Test. In: Zalta, E.N. (ed.) Stanford Encyclopedia of Philosophy, Stanford University, Stanford (2011), http://plato.stanford.edu/entries/turing-test/Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: 4th Intl. Conf. on Cognitive Science (ICCS 2003), Sydney, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: ICML 2006, pp. 881–888. New York (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press, Cambridge (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: A Monte Carlo AIXI Approximation. Journal of Artificial Intelligence Research, JAIR 40, 95–142 (2011)von Ahn, L., Blum, M., Langford, J.: Telling humans and computers apart automatically. Communications of the ACM 47(2), 56–60 (2004)Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. learning 8(3), 279–292 (1992
On environment difficulty and discriminating power
The final publication is available at Springer via http://dx.doi.org/10.1007/s10458-014-9257-1This paper presents a way to estimate the difficulty and discriminating power of
any task instance. We focus on a very general setting for tasks: interactive (possibly multiagent)
environments where an agent acts upon observations and rewards. Instead of analysing
the complexity of the environment, the state space or the actions that are performed by the
agent, we analyse the performance of a population of agent policies against the task, leading
to a distribution that is examined in terms of policy complexity. This distribution is then
sliced by the algorithmic complexity of the policy and analysed through several diagrams
and indicators. The notion of environment response curve is also introduced, by inverting the
performance results into an ability scale. We apply all these concepts, diagrams and indicators
to two illustrative problems: a class of agent-populated elementary cellular automata, showing
how the difficulty and discriminating power may vary for several environments, and a multiagent
system, where agents can become predators or preys, and may need to coordinate.
Finally, we discuss how these tools can be applied to characterise (interactive) tasks and
(multi-agent) environments. These characterisations can then be used to get more insight
about agent performance and to facilitate the development of adaptive tests for the evaluation
of agent abilities.I thank the reviewers for their comments, especially those aiming at a clearer connection with the field of multi-agent systems and the suggestion of better approximations for the calculation of the response curves. The implementation of the elementary cellular automata used in the environments is based on the library 'CellularAutomaton' by John Hughes for R [58]. I am grateful to Fernando Soler-Toscano for letting me know about their work [65] on the complexity of 2D objects generated by elementary cellular automata. I would also like to thank David L. Dowe for his comments on a previous version of this paper. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).José Hernández-Orallo (2015). On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems. 29(3):402-454. https://doi.org/10.1007/s10458-014-9257-1S402454293Anderson, J., Baltes, J., & Cheng, C. T. (2011). Robotics competitions as benchmarks for ai research. The Knowledge Engineering Review, 26(01), 11–17.Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In Proceedings of the National Conference on Artificial Intelligence (pp. 119–125). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Antunes, L., Fortnow, L., van Melkebeek, D., & Vinodchandran, N. V. (2006). Computational depth: Concept and applications. Theoretical Computer Science, 354(3), 391–404. Foundations of Computation Theory (FCT 2003), 14th Symposium on Fundamentals of Computation Theory 2003.Arai, K., Kaminka, G. A., Frank, I., & Tanaka-Ishii, K. (2003). Performance competitions as research infrastructure: Large scale comparative studies of multi-agent teams. Autonomous Agents and Multi-Agent Systems, 7(1–2), 121–144.Ashcraft, M. H., Donley, R. D., Halas, M. A., & Vakali, M. (1992). Chapter 8 working memory, automaticity, and problem difficulty. In Jamie I.D. Campbell (Ed.), The nature and origins of mathematical skills, volume 91 of advances in psychology (pp. 301–329). North-Holland.Ay, N., Müller, M., & Szkola, A. (2010). Effective complexity and its relation to logical depth. IEEE Transactions on Information Theory, 56(9), 4593–4607.Barch, D. M., Braver, T. S., Nystrom, L. E., Forman, S. D., Noll, D. C., & Cohen, J. D. (1997). Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35(10), 1373–1380.Bordini, R. H., Hübner, J. F., & Wooldridge, M. (2007). Programming multi-agent systems in AgentSpeak using Jason. London: Wiley. com.Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S. et al. (2000). Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the National Conference on Artificial Intelligence (pp. 355–362). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.Chaitin, G. J. (1977). Algorithmic information theory. IBM Journal of Research and Development, 21, 350–359.Chedid, F. B. (2010). Sophistication and logical depth revisited. In 2010 IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) (pp. 1–4). IEEE.Cheeseman, P., Kanefsky, B. & Taylor, W. M. (1991). Where the really hard problems are. In Proceedings of IJCAI-1991 (pp. 331–337).Dastani, M. (2008). 2APL: A practical agent programming language. Autonomous Agents and Multi-agent Systems, 16(3), 214–248.Delahaye, J. P. & Zenil, H. (2011). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63–77Dowe, D. L. (2008). Foreword re C. S. Wallace. Computer Journal, 51(5), 523–560. Christopher Stewart WALLACE (1933–2004) memorial special issue.Dowe, D. L., & Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.Du, D. Z., & Ko, K. I. (2011). Theory of computational complexity (Vol. 58). London: Wiley-Interscience.Elo, A. E. (1978). The rating of chessplayers, past and present (Vol. 3). London: Batsford.Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum.Fatès, N. & Chevrier, V. (2010). How important are updating schemes in multi-agent systems? an illustration on a multi-turmite model. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.Ferber, J. & Müller, J. P. (1996). Influences and reaction: A model of situated multiagent systems. In Proceedings of Second International Conference on Multi-Agent Systems (ICMAS-96) (pp. 72–79).Ferrando, P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Applied Psychological Measurement, 33(1), 9–24.Ferrando, P. J. (2012). Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica, 33, 111–139.Gent, I. P., & Walsh, T. (1994). Easy problems are sometimes hard. Artificial Intelligence, 70(1), 335–345.Gershenson, C. & Fernandez, N. (2012). Complexity and information: Measuring emergence, self-organization, and homeostasis at multiple scales. Complexity, 18(2), 29–44.Gruner, S. (2010). Mobile agent systems and cellular automata. Autonomous Agents and Multi-agent Systems, 20(2), 198–233.Hardman, D. K., & Payne, S. J. (1995). Problem difficulty and response format in syllogistic reasoning. The Quarterly Journal of Experimental Psychology, 48(4), 945–975.He, J., Reeves, C., Witt, C., & Yao, X. (2007). A note on problem difficulty measures in black-box optimization: Classification, realizations and predictability. Evolutionary Computation, 15(4), 435–443.Hernández-Orallo, J. (2000). Beyond the turing test. Journal of Logic Language & Information, 9(4), 447–466.Hernández-Orallo, J. (2000). On the computational measurement of intelligence factors. In A. Meystel (Ed.), Performance metrics for intelligent systems workshop (pp. 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Hernández-Orallo, J. (2000). Thesis: Computational measures of information gain and reinforcement in inference processes. AI Communications, 13(1), 49–50.Hernández-Orallo, J. (2010). A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In M. Hutter et al. (Ed.), 3rd International Conference on Artificial General Intelligence (pp. 182–183). Atlantis Press Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf .Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.Hernández-Orallo, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Insa-Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 82–91). Berlin: Springer.Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2014). Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research, 27, 50–74.Hernández-Orallo, J., Insa, J., Dowe, D. L. & Hibbard, B. (2012). Turing tests with turing machines. In A. Voronkov (Ed.), The Alan Turing Centenary Conference, Turing-100, Manchester, 2012, volume 10 of EPiC Series (pp. 140–156).Hernández-Orallo, J. & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of International Symposium of Engineering of Intelligent Systems (EIS’98) (pp. 146–163). ICSC Press.Hibbard, B. (2009). Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence, 1(1), 54–61.Hoos, H. H. (1999). Sat-encodings, search space structure, and local search performance. In 1999 International Joint Conference on Artificial Intelligence (Vol. 16, pp. 296–303).Insa-Cabrera, J., Benacloch-Ayuso, J. L., & Hernández-Orallo, J. (2012). On measuring social intelligence: Experiments on competition and cooperation. In J. Bach, B. Goertzel, & M. Iklé (Eds.), AGI, volume 7716 of lecture notes in computer science (pp. 126–135). Berlin: Springer.Insa-Cabrera, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011). Comparing humans and AI agents. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 122–132). Berlin: Springer.Knuth, D. E. (1973). Sorting and searching, volume 3 of the art of computer programming. Reading, MA: Addison-Wesley.Kotovsky, K., & Simon, H. A. (1990). What makes some problems really hard: Explorations in the problem space of difficulty. Cognitive Psychology, 22(2), 143–183.Legg, S. (2008). Machine super intelligence. PhD thesis, Department of Informatics, University of Lugano, June 2008.Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.Leonetti, M. & Iocchi, L. (2010). Improving the performance of complex agent plans through reinforcement learning. In Proceedings of the 2010 International Conference on Autonomous Agents and Multiagent Systems (Vol. 1, pp. 723–730). International Foundation for Autonomous Agents and Multiagent Systems.Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Levin, L. A. (1986). Average case complete problems. SIAM Journal on Computing, 15, 285.Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). Berlin: Springer.Low, C. K., Chen, T. Y., & Rónnquist, R. (1999). Automated test case generation for bdi agents. Autonomous Agents and Multi-agent Systems, 2(4), 311–332.Madden, M. G., & Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review, 21(3), 375–398.Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115(2), 300.Michel, F. (2004). Formalisme, outils et éléments méthodologiques pour la modélisation et la simulation multi-agents. PhD thesis, Université des sciences et techniques du Languedoc, Montpellier.Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.Orponen, P., Ko, K. I., Schöning, U., & Watanabe, O. (1994). Instance complexity. Journal of the ACM (JACM), 41(1), 96–121.Simon, H. A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70(6), 534.Team, R., et al. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.Wiering, M., & van Otterlo, M. (Eds.). (2012). Reinforcement learning: State-of-the-art. Berlin: Springer.Wolfram, S. (2002). A new kind of science. Champaign, IL: Wolfram Media.Zatuchna, Z., & Bagnall, A. (2009). Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior, 17(1), 28–57.Zenil, H. (2010). Compression-based investigation of the dynamical properties of cellular automata and other systems. Complex Systems, 19(1), 1–28.Zenil, H. (2011). Une approche expérimentale à la théorie algorithmique de la complexité. PhD thesis, Dissertation in fulfilment of the degree of Doctor in Computer Science, Université de Lille.Zenil, H., Soler-Toscano, F., Delahaye, J. P. & Gauvrit, N. (2012). Two-dimensional kolmogorov complexity and validation of the coding theorem method by compressibility. arXiv, preprint arXiv:1212.6745
ROC curves in cost space
The final publication is available at Springer via http://dx.doi.org/10.1007/s10994-013-5328-9ROC curves and cost curves are two popular ways of visualising classifier performance, finding appropriate thresholds according to the operating condition, and deriving useful aggregated measures such as the area under the ROC curve (AUC) or the area under the optimal cost curve. In this paper we present new findings and connections between ROC space and cost space. In particular, we show that ROC curves can be transferred to cost space by means of a very natural threshold choice method, which sets the decision threshold such that the proportion of positive predictions equals the operating condition. We call these new curves rate-driven curves, and we demonstrate that the expected loss as measured by the area under these curves is linearly related to AUC. We show that the rate-driven curves are the genuine equivalent of ROC curves in cost space, establishing a point-point rather than a point-line correspondence. Furthermore, a decomposition of the rate-driven curves is introduced which separates the loss due to the threshold choice method from the ranking loss (Kendall Ď„ distance). We also derive the corresponding curve to the ROC convex hull in cost space; this curve is different from the lower envelope of the cost lines, as the latter assumes only optimal thresholds are chosen.We would like to thank the anonymous referees for their helpful comments. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST-European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Engineering and Physical Sciences Research Council in the UK and the Ministerio de Economia y Competitividad in Spain.Hernández Orallo, J.; Flach ., P.; Ferri RamĂrez, C. (2013). ROC curves in cost space. Machine Learning. 93(1):71-91. https://doi.org/10.1007/s10994-013-5328-9S7191931Adams, N., & Hand, D. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition, 32(7), 1139–1147.Chang, J., & Yap, C. (1986). A polynomial solution for the potato-peeling problem. Discrete & Computational Geometry, 1(1), 155–182.Drummond, C., & Holte, R. (2000). Explicitly representing expected cost: an alternative to ROC representation. In Knowl. discovery & data mining (pp. 198–207).Drummond, C., & Holte, R. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65, 95–130.Elkan, C. (2001). The foundations of cost-sensitive learning. In B. Nebel (Ed.), Proc. of the 17th intl. conf. on artificial intelligence (IJCAI-01) (pp. 973–978).Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.Fawcett, T., & Niculescu-Mizil, A. (2007). PAV and the ROC convex hull. Machine Learning, 68(1), 97–106.Flach, P. (2003). The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In Machine learning, proceedings of the twentieth international conference (ICML 2003) (pp. 194–201).Flach, P., Hernández-Orallo, J., & Ferri, C. (2011). A coherent interpretation of AUC as a measure of aggregated classification performance. In Proc. of the 28th intl. conference on machine learning, ICML2011.Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml .Hand, D. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123.Hernández-Orallo, J., Flach, P., & Ferri, C. (2011). Brier curves: a new cost-based visualisation of classifier performance. In Proceedings of the 28th international conference on machine learning, ICML2011.Hernández-Orallo, J., Flach, P., & Ferri, C. (2012). A unified view of performance metrics: translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13, 2813–2869.Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93. doi: 10.2307/2332226 .Swets, J., Dawes, R., & Monahan, J. (2000). Better decisions through science. Scientific American, 283(4), 82–87
A SIR-based model for contact-based messaging applications supported by permanent infrastructure
[EN] In this paper, we focus on the study of coupled systems of ordinary differential equations (ODEs) describing the diffusion of messages between mobile devices. Communications in mobile opportunistic networks take place upon the establishment of ephemeral contacts among mobile nodes using direct communication. SIR (Sane, Infected, Recovered) models permit to represent the diffusion of messages using an epidemiological based approach.
The question we analyse in this work is whether the coexistence of a fixed infrastructure can improve the diffusion of messages and thus justify the additional costs. We analyse this case from the point of view of dynamical systems, finding and characterising the admissible equilibrium of this scenario. We show that a centralised diffusion is not efficient when people density reaches a sufficient value.
This result supports the interest in developing opportunistic networks for occasionally crowded places to avoid the cost of additional infrastructure.This work was partially supported by Ministerio de Economia y Competitividad, Spain (Grants TEC2014-52690-R, MTM2016-75963-P & BCAM Severo
Ochoa excellence accreditation SEV-2013-0323), Generalitat Valenciana, Spain
(Grants AICO/2015/108, ACOMP/2015/005, GVA/2018/110), by the Basque
Government through the BERC 2014-2017.Conejero, JA.; Hernández-Orallo, E.; Manzoni, P.; Murillo-Arcila, M. (2019). A SIR-based model for contact-based messaging applications supported by permanent infrastructure. Discrete and Continuous Dynamical Systems. Series S. 12(4-5):735-746. https://doi.org/10.3934/dcdss.2019048735746124-
Evaluating a reinforcement learning algorithm with a general intelligence test
In this paper we apply the recent notion of anytime universal intelligence tests to the evaluation of a popular reinforcement learning algorithm, Q-learning. We show that a general approach to intelligence evaluation of AI algorithms is feasible. This top-down (theory-derived) approach is based on a generation of environments under a Solomonoff universal distribution instead of using a pre-defined set of specific tasks, such as mazes, problem repositories, etc. This first application of a general intelligence test to a reinforcement learning algorithm brings us to the issue of task-specific vs. general AI agents. This, in turn, suggests new avenues for AI agent evaluation and AI competitions, and also conveys some further insights about the performance of specific algorithms. © 2011 Springer-Verlag.We are grateful for the funding from the Spanish MEC and MICINN for projects TIN2009-06078-E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051.Insa Cabrera, J.; Dowe, DL.; Hernández Orallo, J. (2011). Evaluating a reinforcement learning algorithm with a general intelligence test. En Advances in Artificial Intelligence. Springer Verlag (Germany). 7023:1-11. https://doi.org/10.1007/978-3-642-25274-7_1S1117023Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Genesereth, M., Love, N., Pell, B.: General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62 (2005)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, Atlantis, pp. 182–183 (2010)Hernández-Orallo, J.: On evaluating agent performance in a fixed period of time. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 25–30. Atlantis Press (2010)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. Intl. Joint Conf. on Artificial Intelligence, IJCAI 19, 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Levin, L.A.: Universal sequential search problems. Problems of Information Transmission 9(3), 265–266 (1973)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc. (2008)Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: Proc. 4th ICCS International Conference on Cognitive Science (ICCS 2003), Sydney, Australia, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and Control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: Proc. of the 23rd Intl. Conf. on Machine Learning, ICML 2006, New York, pp. 881–888 (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: Proc. 24th Conf. on Artificial Intelligence (AAAI 2010), pp. 605–611 (2010)Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine learning 8(3), 279–292 (1992)Weyns, D., Parunak, H.V.D., Michel, F., Holvoet, T., Ferber, J.: Environments for multiagent systems state-of-the-art and research challenges. In: Weyns, D., Van Dyke Parunak, H., Michel, F. (eds.) E4MAS 2004. LNCS (LNAI), vol. 3374, pp. 1–47. Springer, Heidelberg (2005)Whiteson, S., Tanner, B., White, A.: The Reinforcement Learning Competitions. The AI magazine 31(2), 81–94 (2010)Woergoetter, F., Porr, B.: Reinforcement learning. Scholarpedia 3(3), 1448 (2008)Zatuchna, Z., Bagnall, A.: Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior 17(1), 28–57 (2009
Compression and intelligence: social environments and communication
Compression has been advocated as one of the principles which pervades inductive inference and prediction - and, from there, it has also been recurrent in definitions and tests of intelligence. However, this connection is less explicit in new approaches to intelligence. In this paper, we advocate that the notion of compression can appear again in definitions and tests of intelligence through the concepts of `mind-readingÂż and `communicationÂż in the context of multi-agent systems and social environments. Our main position is that two-part Minimum Message Length (MML) compression is not only more natural and effective for agents with limited resources, but it is also much more appropriate for agents in (co-operative) social environments than one-part compression schemes - particularly those using a posterior-weighted mixture of all available models following SolomonoffÂżs theory of prediction. We think that the realisation of these differences is important to avoid a naive view of `intelligence as compressionÂż in favour of a better understanding of how, why and where (one-part or two-part, lossless or lossy) compression is needed.We thank the anonymous reviewers for their helpful comments, and we thank Kurt Kleiner for some challenging and ultimately very
helpful questions in the broad area of this work. We also acknowledge the funding from the Spanish MEC and MICINN for projects TIN2009-06078-E/TIN,
Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, and Generalitat
Valenciana for Prometeo/2008/051.Dowe, DL.; Hernández Orallo, J.; Das, PK. (2011). Compression and intelligence: social environments and communication. En Artificial General Intelligence. Springer Verlag (Germany). 6830:204-211. https://doi.org/10.1007/978-3-642-22887-2_21S2042116830Chaitin, G.J.: Godel’s theorem and information. International Journal of Theoretical Physics 21(12), 941–954 (1982)Dowe, D.L.: Foreword re C. S. Wallace. Computer Journal 51(5), 523–560 (2008); Christopher Stewart WALLACE (1933-2004) memorial special issueDowe, D.L.: Minimum Message Length and statistically consistent invariant (objective?) Bayesian probabilistic inference - from (medical) “evidence”. Social Epistemology 22(4), 433–460 (2008)Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Bandyopadhyay, P.S., Forster, M.R. (eds.) Handbook of the Philosophy of Science. Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier, Amsterdam (2011)Dowe, D.L., Hajek, A.R.: A computational extension to the Turing Test. Technical Report #97/322, Dept Computer Science, Monash University, Melbourne, Australia, 9 pp (1997)Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (February 1998)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Constructive reinforcement learning. International Journal of Intelligent Systems 15(3), 241–264 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg, MD, U.S.A (2000)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Lewis, D.K., Shelby-Richardson, J.: Scriven on human unpredictability. Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition 17(5), 69–74 (1966)Oppy, G., Dowe, D.L.: The Turing Test. In: Zalta, E.N. (ed.) Stanford Encyclopedia of Philosophy, Stanford University, Stanford (2011), http://plato.stanford.edu/entries/turing-test/Salomon, D., Motta, G., Bryant, D.C.O.N.: Handbook of data compression. Springer-Verlag New York Inc., Heidelberg (2009)Sanghi, P., Dowe, D.L.: A computer program capable of passing I.Q. tests. In: 4th International Conference on Cognitive Science (and 7th Australasian Society for Cognitive Science Conference), vol. 2, pp. 570–575. Univ. of NSW, Sydney, Australia (July 2003)Sayood, K.: Introduction to data compression. Morgan Kaufmann, San Francisco (2006)Scriven, M.: An essential unpredictability in human behavior. In: Wolman, B.B., Nagel, E. (eds.) Scientific Psychology: Principles and Approaches, pp. 411–425. Basic Books (Perseus Books), New York (1965)Searle, J.R.: Minds, brains and programs. Behavioural and Brain Sciences 3, 417–457 (1980)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964)Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, 1038–1044 (1996)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT Press, Cambridge (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: A Monte Carlo AIXI Approximation. Journal of Artificial Intelligence Research, JAIR 40, 95–142 (2011)Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, Heidelberg (2005)Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11(2), 185–194 (1968)Wallace, C.S., Dowe, D.L.: Intrinsic classification by MML - the Snob program. In: Proc. 7th Australian Joint Conf. on Artificial Intelligence, pp. 37–44. World Scientific, Singapore (November 1994)Wallace, C.S., Dowe, D.L.: Minimum message length and Kolmogorov complexity. Computer Journal 42(4), 270–283 (1999); Special issue on Kolmogorov complexityWallace, C.S., Dowe, D.L.: MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10, 73–83 (2000
Analytical evaluation of the performance of contact-based messaging applications
Communications in mobile opportunistic networks, instead of using the Internet infrastructure, take place
upon the establishment of ephemeral contacts among mobile nodes using direct communication. In this
paper, we analytically model the performance of mobile opportunistic networks for contact-based messaging
applications in city squares or gathering points, a key challenging topic that is required for the
effective design of novel services. We take into account several social aspects such as: the density of
people, the dynamic of people arriving and leaving a place, the size of the messages and the duration
of the contacts. We base our models on Population Processes, an approach commonly used to represent
the dynamics of biological populations. We study their stable equilibrium points and obtain analytical
expressions for their resolution.
The evaluations performed show that these models can reproduce the dynamics of message diffusion
applications. We demonstrate that when the density of people increases, the effectiveness of the diffusion
is improved. Regarding the arrival and departure of people, their impact is more relevant when the
density of people is low. Finally, we prove that for large message sizes the effectiveness of the epidemic
diffusion is reduced, and novel diffusion protocols should be considered.
© 2016 Elsevier B.V. All rights reserved.This work was partially supported by Ministerio de Economia y Competitividad, Spain (Grants TEC2014-52690-R & MTM2013-47093-P & SEV-2013-0323), Generalitat Valenciana, Spain (Grants AICO/2015/108 & ACOMP/2015/005) and by the Basque Government through the BERC 2014-2017 program.Hernández Orallo, E.; Murillo Arcila, M.; Tavares De Araujo Cesariny Calafate, CM.; Cano Escribá, JC.; Conejero Casares, JA.; Manzoni, P. (2016). Analytical evaluation of the performance of contact-based messaging applications. Computer Networks. 111:45-54. https://doi.org/10.1016/j.comnet.2016.07.006S455411
- …