24 research outputs found
Characterizing Workload of Web Applications on Virtualized Servers
With the ever increasing demands of cloud computing services, planning and
management of cloud resources has become a more and more important issue which
directed affects the resource utilization and SLA and customer satisfaction.
But before any management strategy is made, a good understanding of
applications' workload in virtualized environment is the basic fact and
principle to the resource management methods. Unfortunately, little work has
been focused on this area. Lack of raw data could be one reason; another reason
is that people still use the traditional models or methods shared under
non-virtualized environment. The study of applications' workload in virtualized
environment should take on some of its peculiar features comparing to the
non-virtualized environment. In this paper, we are open to analyze the workload
demands that reflect applications' behavior and the impact of virtualization.
The results are obtained from an experimental cloud testbed running web
applications, specifically the RUBiS benchmark application. We profile the
workload dynamics on both virtualized and non-virtualized environments and
compare the findings. The experimental results are valuable for us to estimate
the performance of applications on computer architectures, to predict SLA
compliance or violation based on the projected application workload and to
guide the decision making to support applications with the right hardware.Comment: 8 pages, 8 figures, The Fourth Workshop on Big Data Benchmarks,
Performance Optimization, and Emerging Hardware in conjunction with the 19th
ACM International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS-2014), Salt Lake City, Utah, USA,
March 1-5, 201
On the nature and impact of self-similarity in real-time systems
In real-time systems with highly variable task execution times simplistic task models are insufficient to accurately model and to analyze the system. Variability can be tackled using distributions rather than a single value, but the proper charac- terization depends on the degree of variability. Self-similarity is one of the deep- est kinds of variability. It characterizes the fact that a workload is not only highly variable, but it is also bursty on many time-scales. This paper identifies in which situations this source of indeterminism can appear in a real-time system: the com- bination of variability in task inter-arrival times and execution times. Although self- similarity is not a claim for all systems with variable execution times, it is not unusual in some applications with real-time requirements, like video processing, networking and gaming.
The paper shows how to properly model and to analyze self-similar task sets and how improper modeling can mask deadline misses. The paper derives an analyti- cal expression for the dependence of the deadline miss ratio on the degree of self- similarity and proofs its negative impact on real-time systems performance through system¿s modeling and simulation. This study about the nature and impact of self- similarity on soft real-time systems can help to reduce its effects, to choose the proper scheduling policies, and to avoid its causes at system design time.This work was developed under a grant from the European Union (FRESCOR-FP6/2005/IST/5-03402).Enrique Hernández-Orallo; Vila Carbó, JA. (2012). On the nature and impact of self-similarity in real-time systems. Real-Time Systems. 48(3):294-319. doi:10.1007/s11241-012-9146-0S294319483Abdelzaher TF, Sharma V, Lu C (2004) A utilization bound for aperiodic tasks and priority driven scheduling. IEEE Trans Comput 53(3):334–350Abeni L, Buttazzo G (1999) QoS guarantee using probabilistic deadlines. In: Proc of the Euromicro confererence on real-time systemsAbeni L, Buttazzo G (2004) Resource reservation in dynamic real-time systems. Real-Time Syst 37(2):123–167Anantharam V (1999) Scheduling strategies and long-range dependence. Queueing Syst 33(1–3):73–89Beran J (1994) Statistics for long-memory processes. Chapman and Hall, LondonBeran J, Sherman R, Taqqu M, Willinger W (1995) Long-range dependence in variable-bit-rate video traffic. IEEE Trans Commun 43(2):1566–1579Boxma O, Zwart B (2007) Tails in scheduling. SIGMETRICS Perform Eval Rev 34(4):13–20Brichet F, Roberts J, Simonian A, Veitch D (1996) Heavy traffic analysis of a storage model with long range dependent on/off sources. Queueing Syst 23(1):197–215Crovella M, Bestavros A (1997) Self-similarity in world wide web traffic: evidence and possible causes. IEEE/ACM Trans Netw 5(6):835–846Dìaz J, Garcìa D, Kim K, Lee C, Bello LL, López J, Min LS, Mirabella O (2002) Stochastic analysis of periodic real-time systems. In: Proc of the 23rd IEEE real-time systems symposium, pp 289–300Erramilli A, Narayan O, Willinger W (1996) Experimental queueing analysis with long-range dependent packet traffic. IEEE/ACM Trans Netw 4(2):209–223Erramilli A, Roughan M, Veitch D, Willinger W (2002) Self-similar traffic and network dynamics. Proc IEEE 90(5):800–819Gardner M (1999) Probabilistic analysis and scheduling of critical soft real-time systems. Phd thesis, University of Illinois, Urbana-ChampaignGarrett MW, Willinger W (1994) Analysis, modeling and generation of self-similar vbr video traffic. In: ACM SIGCOMMHarchol-Balter M (2002) Task assignment with unknown duration. J ACM 49(2):260–288Harchol-Balter M (2007) Foreword: Special issue on new perspective in scheduling. SIGMETRICS Perform Eval Rev 34(4):2–3Harchol-Balter M, Downey AB (1997) Exploiting process lifetime distributions for dynamic load balancing. ACM Trans Comput Syst 15(3):253–285Hernandez-Orallo E, Vila-Carbo J (2007) Network performance analysis based on histogram workload models. In: Proceedings of the 15th international symposium on modeling, analysis, and simulation of computer and telecommunication systems (MASCOTS), pp 331–336Hernandez-Orallo E, Vila-Carbo J (2010) Analysis of self-similar workload on real-time systems. In: IEEE real-time and embedded technology and applications symposium (RTAS). IEEE Computer Society, Washington, pp 343–352Hernández-Orallo E, Vila-Carbó J (2010) Network queue and loss analysis using histogram-based traffic models. Comput Commun 33(2):190–201Hughes CJ, Kaul P, Adve SV, Jain R, Park C, Srinivasan J (2001) Variability in the execution of multimedia applications and implications for architecture. SIGARCH Comput Archit News 29(2):254–265Leland W, Ott TJ (1986) Load-balancing heuristics and process behavior. SIGMETRICS Perform Eval Rev 14(1):54–69Leland WE, Taqqu MS, Willinger W, Wilson DV (1994) On the self-similar nature of ethernet traffic (extended version). IEEE/ACM Trans Netw 2(1):1–15Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61Mandelbrot B (1965) Self-similar error clusters in communication systems and the concept of conditional stationarity. IEEE Trans Commun 13(1):71–90Mandelbrot BB (1969) Long run linearity, locally Gaussian processes, h-spectra and infinite variances. Int Econ Rev 10:82–113Norros I (1994) A storage model with self-similar input. Queueing Syst 16(3):387–396Norros I (2000) Queueing behavior under fractional Brownian traffic. In: Park K, Willinger W (eds) Self-similar network traffic and performance evaluation. Willey, New York, Chap 4Park K, Willinger W (2000) Self-similar network traffic: An overview. In: Park K, Willinger W (eds) Self-similar network traffic and performance evaluation. Willey, New York, Chap 1Paxson V, Floyd S (1995) Wide area traffic: the failure of Poisson modeling. IEEE/ACM Trans Netw 3(3):226–244Rolls DA, Michailidis G, Hernández-Campos F (2005) Queueing analysis of network traffic: methodology and visualization tools. Comput Netw 48(3):447–473Rose O (1995) Statistical properties of mpeg video traffic and their impact on traffic modeling in atm systems. In: Conference on local computer networksRoy N, Hamm N, Madhukar M, Schmidt DC, Dowdy L (2009) The impact of variability on soft real-time system scheduling. In: RTCSA ’09: Proceedings of the 2009 15th IEEE international conference on embedded and real-time computing systems and applications. IEEE Computer Society, Washington, pp 527–532Sha L, Abdelzaher T, Årzén KE, Cervin A, Baker T, Burns A, Buttazzo G, Caccamo M, Lehoczky J, Mok AK (2004) Real time scheduling theory: A historical perspective. Real-Time Syst 28(2):101–155Taqqu MS, Willinger W, Sherman R (1997) Proof of a fundamental result in self-similar traffic modeling. SIGCOMM Comput Commun Rev 27(2):5–23Tia T, Deng Z, Shankar M, Storch M, Sun J, Wu L, Liu J (1995) Probabilistic performance guarantee for real-time tasks with varying computation times. In: Proc of the real-time technology and applications symposium, pp 164–173Vila-Carbó J, Hernández-Orallo E (2008) An analysis method for variable execution time tasks based on histograms. Real-Time Syst 38(1):1–37Willinger W, Taqqu M, Erramilli A (1996) A bibliographical guide to self-similar traffic and performance modeling for modern high-speed networks. In: Stochastic networks: Theory and applications, pp 339–366Willinger W, Taqqu MS, Sherman R, Wilson DV (1997) Self-similarity through high-variability: statistical analysis of ethernet lan traffic at the source level. IEEE/ACM Trans Netw 5(1):71–8
Selecting cash management models from a multiobjective perspective
[EN] This paper addresses the problem of selecting cash management models under different operating conditions from a multiobjective perspective considering not only cost but also risk. A number of models have been proposed to optimize corporate cash management policies. The impact on model performance of different operating conditions becomes an important issue. Here, we provide a range of visual and quantitative tools imported from Receiver Operating Characteristic (ROC) analysis. More precisely, we show the utility of ROC analysis from a triple perspective as a tool for: (1) showing model performance; (2) choosingmodels; and (3) assessing the impact of operating conditions on model performance. We illustrate the selection of cash management models by means of a numerical example.Work partially funded by projects Collectiveware TIN2015-66863-C2-1-R (MINECO/FEDER) and 2014 SGR 118.Salas-Molina, F.; RodrĂguez-Aguilar, JA.; DĂaz-GarcĂa, P. (2018). Selecting cash management models from a multiobjective perspective. Annals of Operations Research. 261(1-2):275-288. https://doi.org/10.1007/s10479-017-2634-9S2752882611-2Ballestero, E. (2007). Compromise programming: A utility-based linear-quadratic composite metric from the trade-off between achievement and balanced (non-corner) solutions. European Journal of Operational Research, 182(3), 1369–1382.Ballestero, E., & Romero, C. (1998). Multiple criteria decision making and its applications to economic problems. Berlin: Springer.Bi, J., & Bennett, K. P. (2003). Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (ICML-03), pp. 43–50.Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.da Costa Moraes, M. B., Nagano, M. S., & Sobreiro, V. A. (2015). Stochastic cash flow management models: A literature review since the 1980s. In Decision models in engineering and management (pp. 11–28). New York: Springer.Doumpos, M., & Zopounidis, C. (2007). Model combination for credit risk assessment: A stacked generalization approach. Annals of Operations Research, 151(1), 289–306.Drummond, C., & Holte, R. C. (2000). Explicitly representing expected cost: An alternative to roc representation. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 98–207). New York: ACM.Drummond, C., & Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol. 17, pp. 973–978). Lawrence Erlbaum associates Ltd.Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.Flach, P. A. (2003). The geometry of roc space: understanding machine learning metrics through roc isometrics. In Proceedings of the 20th international conference on machine learning (ICML-03), pp. 194–201.Garcia-Bernabeu, A., Benito, A., Bravo, M., & Pla-Santamaria, D. (2016). Photovoltaic power plants: a multicriteria approach to investment decisions and a case study in western spain. Annals of Operations Research, 245(1–2), 163–175.Glasserman, P. (2003). Monte Carlo methods in financial engineering (Vol. 53). New York: Springer.Gregory, G. (1976). Cash flow models: a review. Omega, 4(6), 643–656.Hernández-Orallo, J. (2013). Roc curves for regression. Pattern Recognition, 46(12), 3395–3411.Hernández-Orallo, J., Flach, P., & Ferri, C. (2013). Roc curves in cost space. Machine Learning, 93(1), 71–91.Hernández-Orallo, J., Lachiche, N., & Martınez-UsĂł, A. (2014). Predictive models for multidimensional data when the resolution context changes. In Workshop on learning over multiple contexts at ECML, volume 2014.Metz, C. E. (1978). Basic principles of roc analysis. In Seminars in nuclear medicine (Vol. 8, pp. 283–298). Amsterdam: Elsevier.Miettinen, K. (2012). Nonlinear multiobjective optimization (Vol. 12). Berlin: Springer.Ringuest, J. L. (2012). Multiobjective optimization: Behavioral and computational considerations. Berlin: Springer.Ross, S. A., Westerfield, R., & Jordan, B. D. (2002). Fundamentals of corporate finance (sixth ed.). New York: McGraw-Hill.Salas-Molina, F., Pla-Santamaria, D., & Rodriguez-Aguilar, J. A. (2016). A multi-objective approach to the cash management problem. Annals of Operations Research, pp. 1–15.Srinivasan, V., & Kim, Y. H. (1986). Deterministic cash flow management: State of the art and research directions. Omega, 14(2), 145–166.Steuer, R. E., Qi, Y., & Hirschberger, M. (2007). Suitable-portfolio investors, nondominated frontier sensitivity, and the effect of multiple objectives on standard portfolio selection. Annals of Operations Research, 152(1), 297–317.Stone, B. K. (1972). The use of forecasts and smoothing in control limit models for cash management. Financial Management, 1(1), 72.Torgo, L. (2005). Regression error characteristic surfaces. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 697–702). ACM.Yu, P.-L. (1985). Multiple criteria decision making: concepts, techniques and extensions. New York: Plenum Press.Zeleny, M. (1982). Multiple criteria decision making. New York: McGraw-Hill
Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement
The final publication is available at Springer via http://dx.doi.org/ 10.1007/s10462-016-9505-7.The evaluation of artificial intelligence systems and components is crucial for the
progress of the discipline. In this paper we describe and critically assess the different ways
AI systems are evaluated, and the role of components and techniques in these systems. We
first focus on the traditional task-oriented evaluation approach. We identify three kinds of
evaluation: human discrimination, problem benchmarks and peer confrontation. We describe
some of the limitations of the many evaluation schemes and competitions in these three categories,
and follow the progression of some of these tests. We then focus on a less customary
(and challenging) ability-oriented evaluation approach, where a system is characterised by
its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several
possibilities: the adaptation of cognitive tests used for humans and animals, the development
of tests derived from algorithmic information theory or more integrated approaches under
the perspective of universal psychometrics. We analyse some evaluation tests from AI that
are better positioned for an ability-oriented evaluation and discuss how their problems and
limitations can possibly be addressed with some of the tools and ideas that appear within
the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used
when an AI evaluation scheme is under consideration.I thank the organisers of the AEPIA Summer School On Artificial Intelligence, held in September 2014, for giving me the opportunity to give a lecture on 'AI Evaluation'. This paper was born out of and evolved through that lecture. The information about many benchmarks and competitions discussed in this paper have been contrasted with information from and discussions with many people: M. Bedia, A. Cangelosi, C. Dimitrakakis, I. GarcIa-Varea, Katja Hofmann, W. Langdon, E. Messina, S. Mueller, M. Siebers and C. Soares. Figure 4 is courtesy of F. Martinez-Plumed. Finally, I thank the anonymous reviewers, whose comments have helped to significantly improve the balance and coverage of the paper. This work has been partially supported by the EU (FEDER) and the Spanish MINECO under Grants TIN 2013-45732-C4-1-P, TIN 2015-69175-C4-1-R and by Generalitat Valenciana PROMETEOII2015/013.JosĂ© Hernández-Orallo (2016). Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement. Artificial Intelligence Review. 1-51. https://doi.org/10.1007/s10462-016-9505-7S151Abel D, Agarwal A, Diaz F, Krishnamurthy A, Schapire RE (2016) Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119Adams S, Arel I, Bach J, Coop R, Furlan R, Goertzel B, Hall JS, Samsonovich A, Scheutz M, Schlesinger M, Shapiro SC, Sowa J (2012) Mapping the landscape of human-level artificial general intelligence. AI Mag 33(1):25–42Adams SS, Banavar G, Campbell M (2016) I-athlon: towards a multi-dimensional Turing test. AI Mag 37(1):78–84Alcalá J, Fernández A, Luengo J, Derrac J, GarcĂa S, Sánchez L, Herrera F (2010) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17:255–287Alexander JRM, Smales S (1997) Intelligence, learning and long-term memory. Personal Individ Differ 23(5):815–825Alpcan T, Everitt T, Hutter M (2014) Can we measure the difficulty of an optimization problem? In: IEEE information theory workshop (ITW)Alur R, Bodik R, Juniwal G, Martin MMK, Raghothaman M, Seshia SA, Singh R, Solar-Lezama A, Torlak E, Udupa A (2013) Syntax-guided synthesis. In: Formal methods in computer-aided design (FMCAD), 2013, IEEE, pp 1–17Alvarado N, Adams SS, Burbeck S, Latta C (2002) Beyond the Turing test: performance metrics for evaluating a computer simulation of the human mind. In: Proceedings of the 2nd international conference on development and learning, IEEE, pp 147–152Amigoni F, Bastianelli E, Berghofer J, Bonarini A, Fontana G, Hochgeschwender N, Iocchi L, Kraetzschmar G, Lima P, Matteucci M, Miraldo P, Nardi D, Schiaffonati V (2015) Competitions for benchmarking: task and functionality scoring complete performance assessment. IEEE Robot Autom Mag 22(3):53–61Anderson J, Lebiere C (2003) The Newell test for a theory of cognition. Behav Brain Sci 26(5):587–601Anderson J, Baltes J, Cheng CT (2011) Robotics competitions as benchmarks for AI research. Knowl Eng Rev 26(01):11–17Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18Asada M, Hosoda K, Kuniyoshi Y, Ishiguro H, Inui T, Yoshikawa Y, Ogino M, Yoshida C (2009) Cognitive developmental robotics: a survey. IEEE Trans Auton Ment Dev 1(1):12–34Aziz H, Brill M, Fischer F, Harrenstein P, Lang J, Seedig HG (2015) Possible and necessary winners of partial tournaments. J Artif Intell Res 54:493–534Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/mlBagnall AJ, Zatuchna ZV (2005) On the classification of maze problems. In: Bull L, Kovacs T (eds) Foundations of learning classifier system. Studies in fuzziness and soft computing, vol. 183, Springer, pp 305–316. http://rd.springer.com/chapter/10.1007/11319122_12Baldwin D, Yadav SB (1995) The process of research investigations in artificial intelligence - a unified view. IEEE Trans Syst Man Cybern 25(5):852–861Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279Besold TR (2014) A note on chances and limitations of psychometric ai. In: KI 2014: advances in artificial intelligence. Springer, pp 49–54Biever C (2011) Ultimate IQ: one test to rule them all. New Sci 211(2829, 10 September 2011):42–45Borg M, Johansen SS, Thomsen DL, Kraus M (2012) Practical implementation of a graphics Turing test. In: Advances in visual computing. Springer, pp 305–313Boring EG (1923) Intelligence as the tests test it. New Repub 35–37Bostrom N (2014) Superintelligence: paths, dangers, strategies. Oxford University Press, OxfordBrazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer, New YorkBringsjord S (2011) Psychometric artificial intelligence. J Exp Theor Artif Intell 23(3):271–277Bringsjord S, Schimanski B (2003) What is artificial intelligence? Psychometric AI as an answer. In: International joint conference on artificial intelligence, pp 887–893Brundage M (2016) Modeling progress in ai. AAAI 2016 Workshop on AI, Ethics, and SocietyBuchanan BG (1988) Artificial intelligence as an experimental science. Springer, New YorkBuhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect Psychol Sci 6(1):3–5Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: generic solving of text-based captchas. In: Proceedings of the 8th USENIX conference on Offensive Technologies, USENIX Association, p 3Campbell M, Hoane AJ, Hsu F (2002) Deep Blue. Artif Intell 134(1–2):57–83Cangelosi A, Schlesinger M, Smith LB (2015) Developmental robotics: from babies to robots. MIT Press, CambridgeCaputo B, MĂĽller H, Martinez-Gomez J, Villegas M, Acar B, Patricia N, Marvasti N, ĂśskĂĽdarlı S, Paredes R, Cazorla M et al (2014) Imageclef 2014: overview and analysis of the results. In: Information access evaluation. Multilinguality, multimodality, and interaction, Springer, pp 192–211Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka ER Jr, Mitchell TM (2010) Toward an architecture for never-ending language learning. In: AAAI, vol 5, p 3Carroll JB (1993) Human cognitive abilities: a survey of factor-analytic studies. Cambridge University Press, CambridgeCaruana R (1997) Multitask learning. Mach Learn 28(1):41–75Chaitin GJ (1982) Gödel’s theorem and information. Int J Theor Phys 21(12):941–954Chandrasekaran B (1990) What kind of information processing is intelligence? In: The foundation of artificial intelligence—a sourcebook. Cambridge University Press, pp 14–46Chater N (1999) The search for simplicity: a fundamental cognitive principle? Q J Exp Psychol Sect A 52(2):273–302Chater N, Vitányi P (2003) Simplicity: a unifying principle in cognitive science? Trends Cogn Sci 7(1):19–22Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th annual computer security applications conference, ACM, pp 21–30Cochran WG (2007) Sampling techniques. Wiley, New YorkCohen PR, Howe AE (1988) How evaluation guides AI research: the message still counts more than the medium. AI Mag 9(4):35Cohen Y (2013) Testing and cognitive enhancement. Technical repor, National Institute for Testing and Evaluation, Jerusalem, IsraelConrad JG, Zeleznikow J (2013) The significance of evaluation in AI and law: a case study re-examining ICAIL proceedings. In: Proceedings of the 14th international conference on artificial intelligence and law, ACM, pp 186–191Conrad JG, Zeleznikow J (2015) The role of evaluation in ai and law. In: Proceedings of the 15th international conference on artificial intelligence and law, pp 181–186Deary IJ, Der G, Ford G (2001) Reaction times and intelligence differences: a population-based cohort study. Intelligence 29(5):389–399Decker KS, Durfee EH, Lesser VR (1989) Evaluating research in cooperative distributed problem solving. Distrib Artif Intell 2:487–519Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Detterman DK (2011) A challenge to Watson. Intelligence 39(2–3):77–78Dimitrakakis C (2016) Personal communicationDimitrakakis C, Li G, Tziortziotis N (2014) The reinforcement learning competition 2014. AI Mag 35(3):61–65Dowe DL (2013) Introduction to Ray Solomonoff 85th memorial conference. In: Dowe DL (ed) Algorithmic probability and friends. Bayesian prediction and artificial intelligence, lecture notes in computer science, vol 7070. Springer, Berlin, pp 1–36Dowe DL, Hajek AR (1997) A computational extension to the Turing Test. In: Proceedings of the 4th conference of the Australasian cognitive science society, University of Newcastle, NSW, AustraliaDowe DL, Hajek AR (1998) A non-behavioural, computational extension to the Turing test. In: International conference on computational intelligence and multimedia applications (ICCIMA’98), Gippsland, Australia, pp 101–106Dowe DL, Hernández-Orallo J (2012) IQ tests are not for machines, yet. Intelligence 40(2):77–81Dowe DL, Hernández-Orallo J (2014) How universal can an intelligence test be? Adapt Behav 22(1):51–69Drummond C (2009) Replicability is not reproducibility: nor is it good science. In: Proceedings of the evaluation methods for machine learning workshop at the 26th ICML, Montreal, CanadaDrummond C, Japkowicz N (2010) Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. J Exp Theor Artif Intell 22(1):67–80Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. arXiv preprint arXiv:1604.06778Eden AH, Moor JH, Soraker JH, Steinhart E (2013) Singularity hypotheses: a scientific and philosophical assessment. Springer, New YorkEdmondson W (2012) The intelligence in ETI—what can we know? Acta Astronaut 78:37–42Elo AE (1978) The rating of chessplayers, past and present, vol 3. Batsford, LondonEmbretson SE, Reise SP (2000) Item response theory for psychologists. L. Erlbaum, HillsdaleEvans JM, Messina ER (2001) Performance metrics for intelligent systems. NIST Special Publication SP, pp 101–104Everitt T, Lattimore T, Hutter M (2014) Free lunch for optimisation under the universal distribution. In: 2014 IEEE Congress on evolutionary computation (CEC), IEEE, pp 167–174Falkenauer E (1998) On method overfitting. J Heuristics 4(3):281–287Feldman J (2003) Simplicity and complexity in human concept learning. Gen Psychol 38(1):9–15Ferrando PJ (2009) Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Appl Psychol Meas 33(1):9–24Ferrando PJ (2012) Assessing the discriminating power of item and test scores in the linear factor-analysis model. PsicolĂłgica 33:111–139Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock J, Nyberg E, Prager J et al (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79Fogel DB (1991) The evolution of intelligent decision making in gaming. Cybern Syst 22(2):223–236Gaschnig J, Klahr P, Pople H, Shortliffe E, Terry A (1983) Evaluation of expert systems: issues and case studies. Build Exp Syst 1:241–278Geissman JR, Schultz RD (1988) Verification & validation. AI Exp 3(2):26–33Genesereth M, Love N, Pell B (2005) General game playing: overview of the AAAI competition. AI Mag 26(2):62GerĂłnimo D, LĂłpez AM (2014) Datasets and benchmarking. In: Vision-based pedestrian protection systems for intelligent vehicles. Springer, pp 87–93Goertzel B, Pennachin C (eds) (2007) Artificial general intelligence. Springer, New YorkGoertzel B, Arel I, Scheutz M (2009) Toward a roadmap for human-level artificial general intelligence: embedding HLAI systems in broad, approachable, physical or virtual contexts. Artif Gen Intell Roadmap InitiatGoldreich O, Vadhan S (2007) Special issue on worst-case versus average-case complexity editors’ foreword. Comput complex 16(4):325–330Gordon BB (2007) Report on panel discussion on (re-)establishing or increasing collaborative links between artificial intelligence and intelligent systems. In: Messina ER, Madhavan R (eds) Proceedings of the 2007 workshop on performance metrics for intelligent systems, pp 302–303Gulwani S, Hernández-Orallo J, Kitzelmann E, Muggleton SH, Schmid U, Zorn B (2015) Inductive programming meets the real world. Commun ACM 58(11):90–99Hand DJ (2004) Measurement theory and practice. A Hodder Arnold Publication, LondonHernández-Orallo J (2000a) Beyond the Turing test. J Logic Lang Inf 9(4):447–466Hernández-Orallo J (2000b) On the computational measurement of intelligence factors. In: Meystel A (ed) Performance metrics for intelligent systems workshop. National Institute of Standards and Technology, Gaithersburg, pp 1–8Hernández-Orallo J (2000c) Thesis: computational measures of information gain and reinforcement in inference processes. AI Commun 13(1):49–50Hernández-Orallo J (2010) A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Artificial general intelligence, 3rd International Conference. Atlantis Press, Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf , pp 182–183Hernández-Orallo J (2014) On environment difficulty and discriminating power. Auton Agents Multi-Agent Syst. 29(3):402–454. doi: 10.1007/s10458-014-9257-1Hernández-Orallo J, Dowe DL (2010) Measuring universal intelligence: towards an anytime intelligence test. Artif Intell 174(18):1508–1539Hernández-Orallo J, Dowe DL (2013) On potential cognitive abilities in the machine kingdom. Minds Mach 23:179–210Hernández-Orallo J, Minaya-Collado N (1998) A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proceedings of international symposium of engineering of intelligent systems (EIS’98), ICSC Press, pp 146–163Hernández-Orallo J, Dowe DL, España-Cubillo S, Hernández-Lloreda MV, Insa-Cabrera J (2011) On more realistic environment distributions for defining, evaluating and developing intelligence. In: Schmidhuber J, ThĂłrisson K, Looks M (eds) Artificial general intelligence, LNAI, vol 6830. Springer, New York, pp 82–91Hernández-Orallo J, Flach P, Ferri C (2012a) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869Hernández-Orallo J, Insa-Cabrera J, Dowe DL, Hibbard B (2012b) Turing Tests with Turing machines. In: Voronkov A (ed) Turing-100, EPiC Series, vol 10, pp 140–156Hernández-Orallo J, Dowe DL, Hernández-Lloreda MV (2014) Universal psychometrics: measuring cognitive abilities in the machine kingdom. Cogn Syst Res 27:50–74Hernández-Orallo J, MartĂnez-Plumed F, Schmid U, Siebers M, Dowe DL (2016) Computer models solving intelligence test problems: progress and implications. Artif Intell 230:74–107Herrmann E, Call J, Hernández-Lloreda MV, Hare B, Tomasello M (2007) Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science 317(5843):1360–1366Hibbard B (2009) Bias and no free lunch in formal measures of intelligence. J Artif Gen Intell 1(1):54–61Hingston P (2010) A new design for a Turing Test for bots. In: 2010 IEEE symposium on computational intelligence and games (CIG), IEEE, pp 345–350Hingston P (2012) Believable bots: can computers play like people?. Springer, New YorkHo TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300Hutter M (2007) Universal algorithmic intelligence: a mathematical top → down approach. In: Goertzel B, Pennachin C (eds) Artificial general intelligence, cognitive technologies. Springer, Berlin, pp 227–290Igel C, Toussaint M (2005) A no-free-lunch theorem for non-uniform distributions of target functions. J Math Model Algorithms 3(4):313–322Insa-Cabrera J (2016) Towards a universal test of social intelligence. Ph.D. thesis, Departament de Sistemes Informátics i ComputaciĂł, UPVInsa-Cabrera J, Dowe DL, España-Cubillo S, Hernández-Lloreda MV, Hernández-Orallo J (2011a) Comparing humans and ai agents. In: Schmidhuber J, ThĂłrisson K, Looks M (eds) Artificial general intelligence, LNAI, vol 6830. Springer, New York, pp 122–132Insa-Cabrera J, Dowe DL, Hernández-Orallo J (2011) Evaluating a reinforcement learning algorithm with a general intelligence test. In: Lozano JA, Gamez JM (eds) Current topics in artificial intelligence. CAEPIA 2011, LNAI series 7023. Springer, New YorkInsa-Cabrera J, Benacloch-Ayuso JL, Hernández-Orallo J (2012) On measuring social intelligence: experiments on competition and cooperation. In: Bach J, Goertzel B, IklĂ© M (eds) AGI, lecture notes in computer science, vol 7716. Springer, New York, pp 126–135Jacoff A, Messina E, Weiss BA, Tadokoro S, Nakagawa Y (2003) Test arenas and performance metrics for urban search and rescue robots. In: Proceedings of 2003 IEEE/RSJ international conference on intelligent robots and systems, 2003 (IROS 2003), IEEE, vol 4, pp 3396–3403Japkowicz N, Shah M (2011) Evaluating learning algorithms. Cambridge University Press, CambridgeJiang J (2008) A literature survey on domain adaptation of statistical classifiers. http://sifaka.cs.uiuc.edu/jiang4/domain_adaptation/surveyJohnson M, Hofmann K, Hutton T, Bignell D (2016) The Malmo platform for artificial intelligence experimentation. In: International joint conference on artificial intelligence (IJCAI)Keith TZ, Reynolds MR (2010) Cattell–Horn–Carroll abilities and cognitive tests: what we’ve learned from 20 years of research. Psychol Schools 47(7):635–650Ketter W, Symeonidis A (2012) Competitive benchmarking: lessons learned from the trading agent competition. AI Mag 33(2):103Khreich W, Granger E, Miri A, Sabourin R (2012) A survey of techniques for incremental learning of HMM parameters. Inf Sci 197:105–130Kim JH (2004) Soccer robotics, vol 11. Springer, New YorkKitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E (1997) Robocup: the robot world cup initiative. In: Proceedings of the first international conference on autonomous agents, ACM, pp 340–347Kleiner K (2011) Who are you calling bird-brained? An attempt is being made to devise a universal intelligence test. Economist 398(8723, 5 March 2011):82Knuth DE (1973) Sorting and searching, volume 3 of the art of computer programming. Addison-Wesley, ReadingKoza JR (2010) Human-competitive results produced by genetic programming. Genet Program Evolvable Mach 11(3–4):251–284Krueger J, Osherson D (1980) On the psychology of structural simplicity. In: Jusczyk PW, Klein RM (eds) The nature of thought: essays in honor of D. O. Hebb. Psychology Press, London, pp 187–205Langford J (2005) Clever methods of overfitting. Machine Learning (Theory). http://hunch.netLangley P (1987) Research papers in machine learning. Mach Learn 2(3):195–198Langley P (2011) The changing science of machine learning. Mach Learn 82(3):275–279Langley P (2012) The cognitive systems paradigm. Adv Cogn Syst 1:3–13Lattimore T, Hutter M (2013) No free lunch versus Occam’s razor in supervised learning. Algorithmic Probability and Friends. Springer, Bayesian Prediction and Artificial Intelligence, pp 223–235Leeuwenberg ELJ, Van Der Helm PA (2012) Structural information theory: the simplicity of visual form. Cambridge University Press, CambridgeLegg S, Hutter M (2007a) Tests of machine intelligence. In: Lungarella M, Iida F, Bongard J, Pfeifer R (eds) 50 Years of Artificial Intelligence, Lecture Notes in Computer Science, vol 4850, Springer Berlin Heidelberg, pp 232–242. doi: 10.1007/978-3-540-77296-5_22Legg S, Hutter M (2007b) Universal intelligence: a definition of machine intelligence. Minds Mach 17(4):391–444Legg S, Veness J (2013) An approximation of the universal intelligence measure. Algorithmic Probability and Friends. Springer, Bayesian Prediction and Artificial Intelligence, pp 236–249Levesque HJ (2014) On our best behaviour. Artif Intell 212:27–35Levesque HJ, Davis E, Morgenstern L (2012) The winog
Ensemble of a subset of kNN classifiers
Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines
Stochastic bounds and histograms for active queues management and networks analysis
International audienceWe present an extension of a methodology based on monotonicity of various networking elements and measurements performed on real networks. Assuming the stationarity of flows, we obtain histograms (distributions) for the arrivals. Unfortunately, these distributions have a large number of values and the numerical analysis is extremely time-consuming. Using the stochastic bounds and the monotonicity of the networking elements, we show how we can obtain, in a very efficient manner, guarantees on performance measures. Here, we present two extensions: the merge element which combine several flows into one, and some Active Queue Management (AQM) mechanisms. This extension allows to study networks with a feed-forward topolog
Stochastic bounds and histograms for network performance analysis
International audienceExact analysis of queueing networks under real traffic histograms becomes quickly intractable due to the state explosion. In this paper, we propose to apply the stochastic comparison method to derive performance measure bounds under histogram-based traffics. We apply an algorithm based on dynamic programming to derive bounding traffic histograms on reduced state spaces. We indeed obtain easier bounding stochastic processes providing stochastic upper and lower bounds on buffer occupancy histograms (queue length distributions) for finite queue models. We evaluate the proposed method under real traffic traces, and we compare the results with those obtained by an approximative method. Numerical results illustrate that the proposed method provides more accurate results with a tradeoff between computation time and accuracy. Moreover, the derived performance bounds are very relevant in network dimensioning