4,443 research outputs found

    Stochastic Prediction of Multi-Agent Interactions from Partial Observations

    Full text link
    We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to forecast future states. We show that our method outperforms various baselines on two sports datasets, one based on real basketball trajectories, and one generated by a soccer game engine.Comment: ICLR 2019 camera read

    Effects of Training Data Variation and Temporal Representation in a QSR-Based Action Prediction System

    Get PDF
    Understanding of behaviour is a crucial skill for Artificial Intelligence systems expected to interact with external agents – whether other AI systems, or humans, in scenarios involving co-operation, such as domestic robots capable of helping out with household jobs, or disaster relief robots expected to collaborate and lend assistance to others. It is useful for such systems to be able to quickly learn and re-use models and skills in new situations. Our work centres around a behaviourlearning system utilising Qualitative Spatial Relations to lessen the amount of training data required by the system, and to aid generalisation. In this paper, we provide an analysis of the advantages provided to our system by the use of QSRs. We provide a comparison of a variety of machine learning techniques utilising both quantitative and qualitative representations, and show the effects of varying amounts of training data and temporal representations upon the system. The subject of our work is the game of simulated RoboCup Soccer Keepaway. Our results show that employing QSRs provides clear advantages in scenarios where training data is limited, and provides for better generalisation performance in classifiers. In addition, we show that adopting a qualitative representation of time can provide significant performance gains for QSR systems

    Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

    Full text link
    Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems. However, these deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest in developing explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide better visual explanations of CNN model predictions, in terms of better object localization as well as explaining occurrences of multiple object instances in a single image, when compared to state-of-the-art. We provide a mathematical derivation for the proposed method, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the corresponding class label. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ provides promising human-interpretable visual explanations for a given CNN architecture across multiple tasks including classification, image caption generation and 3D action recognition; as well as in new settings such as knowledge distillation.Comment: 17 Pages, 15 Figures, 11 Tables. Accepted in the proceedings of IEEE Winter Conf. on Applications of Computer Vision (WACV2018). Extended version is under review at IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Artificial Intelligence and Systems Theory: Applied to Cooperative Robots

    Full text link
    This paper describes an approach to the design of a population of cooperative robots based on concepts borrowed from Systems Theory and Artificial Intelligence. The research has been developed under the SocRob project, carried out by the Intelligent Systems Laboratory at the Institute for Systems and Robotics - Instituto Superior Tecnico (ISR/IST) in Lisbon. The acronym of the project stands both for "Society of Robots" and "Soccer Robots", the case study where we are testing our population of robots. Designing soccer robots is a very challenging problem, where the robots must act not only to shoot a ball towards the goal, but also to detect and avoid static (walls, stopped robots) and dynamic (moving robots) obstacles. Furthermore, they must cooperate to defeat an opposing team. Our past and current research in soccer robotics includes cooperative sensor fusion for world modeling, object recognition and tracking, robot navigation, multi-robot distributed task planning and coordination, including cooperative reinforcement learning in cooperative and adversarial environments, and behavior-based architectures for real time task execution of cooperating robot teams

    A framework for the analytical and visual interpretation of complex spatiotemporal dynamics in soccer

    Get PDF
    Pla de Doctorat Industrial de la Generalitat de CatalunyaSports analytics is an emerging field focused on the application of advanced data analysis for assessing the performance of professional athletes and teams. In soccer, the integration of data analysis is in its initial steps, primarily due to the difficulty of making sense of soccer's complex spatiotemporal relationships and effectively translating findings to practitioners. Recently, the availability of spatiotemporal data has given rise to applying statistical approaches to address problems such as estimating passing and scoring probability, or the evaluation of players' mental pressure. However, most of these approaches focus on isolated aspects of the sport, while coaches tend to focus on the broader interplay of all 22 players on the pitch. To address the non-stop flow of questions that coaching staff deal with daily, we identify the need for a flexible analysis framework that allows us to answer these questions quickly, accurately, and in a visually-interpretable way while capturing the complex spatial and contextual factors that rule the game. We propose developing such a comprehensive framework through the concept of the expected possession value (EPV). First introduced in basketball, EPV constitutes an instantaneous estimate of the expected points to be scored at the end of a possession. However, aside from a shared high-level goal, our focus on soccer necessitates a drastically different approach to account for the sport's nuances, such as looser notions of possession, the ability of passes to happen at any location, and space-time dependent turnover evaluation. Following this, we propose modeling EPV in soccer by addressing the question, "can we estimate the expectation of a team scoring or conceding the next goal at any time in the game?" From here, we address a series of derived interrogations, such as how should the EPV expression be structured so coaches can more easily interpret it? Can we produce calibrated and interpretable estimates for each of its components? Can we develop representative and soccer-specific features with the aid of coaches? Is it possible to learn complex features from raw level spatiotemporal data? Finally, and most importantly, can we produce compelling practical applications? These questions are successfully addressed in this thesis, where we present a series of contributions for both the machine learning and soccer analytics fields related to the modeling and practical interpretation of complex spatiotemporal dynamics. We propose a decomposed modeling approach where a series of foundational soccer components can be estimated separately and then merged to provide a single EPV estimation, providing flexibility to this integrated model. From a practical standpoint, we leverage several function approximation approaches to exploit complex relationships in spatiotemporal tracking data. An essential contribution of this work is the proposal of SoccerMap, a flexible deep learning architecture capable of producing accurate and visually-interpretable probability surfaces in a broad range of problems. Based on a large set of spatial and contextual features developed, we model and provide accurate estimates for each of the components of the EPV components. The flexibility and interpretation capabilities of the proposed model allow us to produce a broad set of practical applications related to on-ball performance, off-ball performance, and match analysis in soccer, and open the door for its future adaption to other sports. This thesis was developed under an Industrial Ph.D. program and carried out entirely at Fútbol Club Barcelona, which promoted a close collaboration with professional coaches. As a result, a vast part of the ideas developed in this thesis is now part of the club's daily player and team performance analysis pipeline.Sports analytics es una área de investigación de gran crecimiento y que se encuentra enfocada en la aplicación de análisis avanzado de datos para la evaluación del rendimiento de equipos y deportistas profesionales. En el fútbol, la integración del análisis de datos se encuentra en una etapa incipiente, principalmente dado la dificultad de evaluar los complejos factores espacio-temporales del juego, y de traducir los hallazgos al lenguaje de los entrenadores. La reciente disponibilidad de datos espacio-temporales ha dado pie a la aplicación de métodos estadísticos para explorar problemas tales como la estimación de la probabilidad de pasar o rematar exitosamente, o la evaluación de la presión mental durante el juego, entre muchos otros. Sin embargo, la mayoría de los estudios hasta la fecha se han enfocado en aspectos aislados del juego, mientras que el análisis de los entrenadores suele tomar una óptica más integral en la que considera la interacción de los 22 jugadores en el campo. En base a todo esto, identificamos la necesidad de contar con un completo sistema (framework) de análisis que permite responder al contínuo flujo de preguntas de los cuerpos técnicos de forma ágil y visualmente interpretable, y que al mismo tiempo permita capturar los complejos fenómenos espaciales y contextuales que rigen al fútbol. Proponemos el desarrollo de este sistema a través del concepto del valor esperado de la posesión (EPV, por sus siglas en inglés). El EPV, que fue introducido inicialmente en el baloncesto, constituye la estimación segundo a segundo de los puntos que se esperan obtener al final de una posesión de balón. Sin embargo, su adaptación al fútbol requiere de un enfoque completamente diferente para poder captar conceptos esenciales tales como que los pases pueden ir a cualquier ubicación en el campo, una definición menos rígida de la posesión de balón, y los efectos de perder el balón de acuerdo al espacio y tiempo en que este ocurre. En base esto, proponemos modelar el EPV enfocándonos en responder la siguiente pregunta ¿podemos estimar la esperanza de que un equipo marque o reciba el próximo gol, en cualquier instante del partido? A partir de aquí, desarrollamos una serie de preguntas derivadas relacionadas con la capacidad de proveer flexibilidad e interpretabilidad a nuestro modelo, así como desarrollar aplicaciones prácticas de forma ágil. Estas interrogantes son desarrolladas con éxito en esta tesis, donde presentamos una serie de contribuciones tanto al área de machine learning como a la de sports analytics. Proponemos un novedoso enfoque en el que se descompone el EPV en una serie de componentes esenciales, que pueden ser estimados de forma separada y luego integrados para producir una estimación única del EPV, dotando de mayor flexibilidad a este modelo integrado. Desde un punto de vista práctico, nos apoyamos en una serie de métodos de aproximación de funciones para sacar provecho de relaciones complejas en datos espacio-temporales de tracking. Derivado de esto, proponemos SoccerMap, una flexible arquitectura de deep learning capaz de producir superficies de probabilidad precisas y visualmente interpretables. Adicionalmente, nos apoyamos en una larga serie de variables espaciales y contextuales, desarrolladas en este trabajo, para modelar y proveer estimaciones acuradas de cada uno de los componentes del EPV. La flexibilidad de este modelo nos permite producir una vasta cantidad de aplicaciones prácticas relacionadas al rendimiento con y sin balón, y al análisis de partidos en fútbol, y marca un camino para su integración en otros deportes. Esta tesis fue desarrollada con el apoyo del Plan de Doctorados Industriales del Departamento de Investigación y Universidades de la Generalitat de Catalunya, y llevado a cabo en el Fútbol Club Barcelona, contando con la colaboración de entrenadores y profesionales del club.Postprint (published version

    Deep Visual Foresight for Planning Robot Motion

    Full text link
    A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation -- pushing objects -- and can handle novel objects not seen during training.Comment: ICRA 2017. Supplementary video: https://sites.google.com/site/robotforesight

    Qualitative Abstraction and Inherent Uncertainty in Scene Recognition

    Get PDF
    The interpretation of scenes, e.g., in videos, is demanding at all levels. At the image processing level it is necessary to apply an "intelligent" segmentation and to determine the objects of interest. For the higher symbolic levels it is a challenging task to perform the transition between quantitative and qualitative data and to determine the relations between objects. Here we assume that the position of objects ("agents") in images and videos will already be determined as a minimal requirement for the further analysis. The interpretation of complex and dynamic scenes with embedded intentional agents is one of the most challenging tasks in current AI and imposes highly heterogeneous requirements. A key problem is the efficient and robust representation of uncertainty. We propose that uncertainty should be distinguished with respect to two different epistemological sources: (1) noisy sensor information and (2) ignorance. In this presentation we propose possible solutions to this class of problems. The use and evaluation of sensory information in the field of robotics shows impressive results especially in the fields of localization (e.g. MCL) and map building (e.g. SLAM) but also imposes serious problems on the successive higher levels of processing due to the probabilistic nature. In this presentation we propose that the use of (a) qualitative abstraction (classic approach) from quantitative to (at least partial) qualitative representations and (b) coherence-based perception validation based on Dempster-Shafer (DST) can help to reduce the problem significantly. The second important probability problem class that will be addressed is ignorance. In our presentation we will focus on reducing missing information by inference. We contrast/compare our experiences in an important field of scene interpretation namely plan and intention recognition. The first approach is based on a logical abductive approach and the second approach in contrast uses a probabilistic approach (Relational Hidden Markov Model (RHMM))
    corecore