44 research outputs found

    Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

    Full text link
    [EN] With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we present two indicators on the side of the AI problems, difficulty and discrimination, and two indicators on the side of the AI systems, ability and generality. The first three are adapted from psychometric models in item response theory (IRT), whereas generality is defined as a new metric that evaluates whether an agent is consistently good at easy problems and bad at difficult ones. We illustrate how these key indicators give us more insight on the results of two popular benchmarks in AI, the Arcade Learning Environment (Atari 2600 games) and the General Video Game AI competition, and we include some guidelines to estimate and interpret these indicators for other AI benchmarks and competitions.This work was supported by the U.S. Air Force Office of Scientific Research under Award FA9550-17-1-0287; in part by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R; and in part by the Generalitat Valenciana PROMETEOII/2015/013. The work of F. Mart ' inez-Plumed was supported by INCIBE (Ayudas para la excelencia de los equipos de investigaci ' on avanzada en ciberseguridad), the European Commission, JRC's Centre for Advanced Studies, HUMAINT project (Expert Contract CT-EX2018D335821-101), and UPV PAID-06-18 Ref. SP20180210. The work of J. Hern ' andez-Orallo was supported in part by Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD, in part by the BEST Grant (BEST/2017/045) from the GVA for research stays at the CFI, and in part by the FLI grant RFP2-152.Martínez-Plumed, F.; Hernández-Orallo, J. (2020). Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality. IEEE Transactions on Games. 12(2):121-131. https://doi.org/10.1109/TG.2018.2883773S12113112

    Árboles de estimación estocástica de probabilidades: Newton Trees

    Full text link
    Este trabajo presenta un nuevo método de inducción de árboles de decisión, los Newton Trees (o Árboles de Estimación Estocástica de Probabilidades). Se trata de una redefinición de los árboles de decisión tradicionales que aúna las ventajas de los árboles de estimación de probabilidades (PETs) y de los árboles basados en distancias.Martínez Plumed, F. (2010). Árboles de estimación estocástica de probabilidades: Newton Trees. http://hdl.handle.net/10251/14537Archivo delegad

    Item response theory in AI: Analysing machine learning classifiers at the instance level

    Full text link
    [EN] AI systems are usually evaluated on a range of problem instances and compared to other AI systems that use different strategies. These instances are rarely independent. Machine learning, and supervised learning in particular, is a very good example of this. Given a machine learning model, its behaviour for a single instance cannot be understood in isolation but rather in relation to the rest of the data distribution or dataset. In a dual way, the results of one machine learning model for an instance can be analysed in comparison to other models. While this analysis is relative to a population or distribution of models, it can give much more insight than an isolated analysis. Item response theory (IRT) combines this duality between items and respondents to extract latent variables of the items (such as discrimination or difficulty) and the respondents (such as ability). IRT can be adapted to the analysis of machine learning experiments (and by extension to any other artificial intelligence experiments). In this paper, we see that IRT suits classification tasks perfectly, where instances correspond to items and classifiers correspond to respondents. We perform a series of experiments with a range of datasets and classification methods to fully understand what the IRT parameters such as discrimination, difficulty and guessing mean for classification instances (and their relation to instance hardness measures) and how the estimated classifier ability can be used to compare classifier performance in a different way through classifier characteristic curves.This work has been partially supported by the EU (FEDER) and the Ministerio de Economia y Competitividad (MINECO) in Spain grant TIN2015-69175-C4-1-R, the Air Force Office of Scientific Research under award number FA9550-17-1-0287, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences Technologies ERA-Net (CHIST-ERA) and funded by Ministerio de Economia y Competitividad (MINECO) in Spain (PCIN-2013-037), and by Generalitat Valenciana PROMETEOII/2015/013. Fernando Martinez-Plumed was also supported by INCIBE (INCIBEI-2015-27345) "Ayudas para la excelencia de los equipos de investigacion avanzada en ciberseguridad", the European Commission (Joint Research Centre) HUMAINT project (Expert Contract CT-EX2018D335821-101), and Universitat Politecnica de Valencia (PAID-06-18 Ref. SP20180210). Ricardo Prudencio was financially supported by CNPq (Brazilian Agency). Jose Hernandez-Orallo was supported by a Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD for a research stay at the Leverhulme Centre for the Future of Intelligence (CFI), Cambridge, a BEST grant (BEST/2017/045) from the Valencia GVA for another research stay also at the CFI, and an FLI grant RFP2.Martínez-Plumed, F.; Prudencio, R.; Martínez-Usó, A.; Hernández-Orallo, J. (2019). Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence. 271:18-42. https://doi.org/10.1016/j.artint.2018.09.004S184227

    Cycling network projects: a decision-making aid approach

    Full text link
    Effcient and clean urban mobility is a key factor in quality of life and sustainability of towns and cities. Traditionally, cities have focused on cars and other fuel-based vehicles as transport means. However, several problems are directly linked to massive car use, particularly in terms of air pollution and traffc congestion. Several works reckon that vehicle emissions produce over 90% of air pollution. One way to reduce the use of fuel-based vehicles (and thus the emission of pollutants) is to create effcient, easily accessible and secure bike lane networks which, as many studies show, promote cycling as a major mean of conveyance. In this regard, this paper presents an approach to design and calculate bike lane networks based on the use of open data about the historical use of a urban bike rental services. Concretely, we model this task as a network design problem (NDP) and we study four di erent optimisation strategies to solve it. We test these methods using data of the city of Valencia (Spain). Our experiments conclude that an optimisation approach based on genetic programming obtains the best performance. The proposed method can be easily used to improve or extend bike lane networks based on historic bike use data in other cities.This work has been partially supported by the EU (FEDER) and Spanish MINECO grant TIN2015-69175-C4-1-R, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences Technologies ERA-Net (CHIST-ERA), and funded by MINECO in Spain (PCIN-2013-037), by Generalitat Valenciana PROMETEOII/2015/013, and by the French National Research agency (ANR).Martínez Plumed, F.; Ferri Ramírez, C.; Contreras Ochando, L. (2016). Cycling network projects: a decision-making aid approach. CEUR Workshop Proceedings. http://hdl.handle.net/10251/87734

    Trabajos Finales de Grado y Máster orientados a investigación como caso de éxito

    Get PDF
    La realización de proyectos y trabajos académicos realizados al final de los estudios de grado y máster deben desafiar al estudiante a pensar de nuevas maneras, a aplicar sus conocimientos para explorar y resolver problemas, y a compartir ese pensamiento y esa nueva experiencia con sus compañeros. En este sentido, dotar de un un carácter investigador e indagador a estos proyectos es fundamental, no sólo para quienes deciden seguir una carrera académica, sino también para los que eligen una trayectoria profesional para su futuro. En este contexto, este trabajo describe la metodología seguida por nuestro equipo de investigación para proponer y supervisar TFG/TFMs de carácter investigador de manera colaborativa y coordinada entre los integrantes del grupo. El objetivo principal es aumentar la excelencia y la calidad de los trabajos así como mejorar la motivación de nuestros estudiantes involucrándoles en los proyectos de investigación en los que participamos. Los datos obtenidos a través de encuestas realizadas a los alumnos así como el análisis de las calificaciones muestran que no solo los objetivos y competencias se han alcanzado, sino que también la motivación alcanza niveles elevados.The completion of academic projects and assignments at the end of Bachelor’s and Master’s degrees should challenge students to think in new ways, to apply their knowledge to explore and solve problems, and to share that thinking and new experience with their peers. In this sense, giving a research and inquiry character to these projects is essential, not only for those who decide to pursue an academic career, but also for those who choose a professional path for their future. In this context, this paper describes the methodology followed by our research team to propose and supervise TFG/TFMs of a research nature in a collaborative and coordinated manner among the members of the group. The main objective is to increase the excellence and quality of the work as well as to improve the motivation of our students by involving them in the research projects in which we participate. The data obtained through student surveys and grade analysis show that not only have the objectives and competences been achieved, but also that motivation is high

    Research community dynamics behind popular AI benchmarks

    Full text link
    [EN] The widespread use of experimental benchmarks in AI research has created competition and collaboration dynamics that are still poorly understood. Here we provide an innovative methodology to explore these dynamics and analyse the way different entrants in these challenges, from academia to tech giants, behave and react depending on their own or others' achievements. We perform an analysis of 25 popular benchmarks in AI from Papers With Code, with around 2,000 result entries overall, connected with their underlying research papers. We identify links between researchers and institutions (that is, communities) beyond the standard co-authorship relations, and we explore a series of hypotheses about their behaviour as well as some aggregated results in terms of activity, performance jumps and efficiency. We characterize the dynamics of research communities at different levels of abstraction, including organization, affiliation, trajectories, results and activity. We find that hybrid, multi-institution and persevering communities are more likely to improve state-of-the-art performance, which becomes a watershed for many community members. Although the results cannot be extrapolated beyond our selection of popular machine learning benchmarks, the methodology can be extended to other areas of artificial intelligence or robotics, and combined with bibliometric studies.F.M.-P. acknowledges funding from the AI-Watch project by DG CONNECT and DG JRC of the European Commission. J.H.-O. and S.O.h. were funded by the Future of Life Institute, FLI, under grant RFP2-152. J.H.-O. was supported by the EU (FEDER) and Spanish MINECO under RTI2018-094403-B-C32, Generalitat Valenciana under PROMETEO/2019/098 and European Union's Horizon 2020 grant no. 952215 (TAILOR).Martínez-Plumed, F.; Barredo, P.; Ó Héigeartaigh, S.; Hernández-Orallo, J. (2021). Research community dynamics behind popular AI benchmarks. Nature Machine Intelligence. 3(7):581-589. https://doi.org/10.1038/s42256-021-00339-6S5815893
    corecore