80,469 research outputs found

    Learning Fair Division from Bandit Feedback

    Full text link
    This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities. Departing from conventional online algorithm, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing \textit{dual averaging}, enabling gradual learning of both the type distribution of arriving items and agents' values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We establish regret bounds in Nash social welfare and empirically validate the superior performance of our proposed algorithms across synthetic and empirical datasets

    Model Estimation Within Planning and Learning

    Get PDF
    Risk and reward are fundamental concepts in the cooperative control of unmanned systems. In this research, we focus on developing a constructive relationship between cooperative planning and learning algorithms to mitigate the learning risk, while boosting system (planner & learner) asymptotic performance and guaranteeing the safety of agent behavior. Our framework is an instance of the intelligent cooperative control architecture (iCCA) where the learner incrementally improves on the output of a baseline planner through interaction and constrained exploration. We extend previous work by extracting the embedded parameterized transition model from within the cooperative planner and making it adaptable and accessible to all iCCA modules. We empirically demonstrate the advantage of using an adaptive model over a static model and pure learning approaches in an example GridWorld problem and a UAV mission planning scenario with 200 million possibilities. Finally we discuss two extensions to our approach to handle cases where the true model can not be captured exactly through the presumed functional form.United States. Air Force Office of Scientific Research (FA9550-09-1-0522)Natural Sciences and Engineering Research Council of CanadaUSAF (FA9550-09-1-0522

    How Simulation can Illuminate Pedagogical and System Design Issues in Dynamic Open Ended Learning Environments

    Get PDF
    A Dynamic Open-Ended Learning Environment (DOELE) is a collection of learners and learning objects (LOs) that could be constantly changing. In DOELEs, learners need the support of Advanced Learning Technology (ALT), but most ALT is not designed to run in such environments. An architecture for designing advanced learning technology that is compatible with DOELEs is the ecological approach (EA). This thesis looks at how to test and develop ALT based on the EA, and argues that this process would benefit from the use of simulation. The essential components of an EA-based simulation are: simulated learners, simulated LOs, and their simulated interactions. In this thesis the value of simulation is demonstrated with two experiments. The first experiment focuses on the pedagogical issue of peer impact, how learning is impacted by the performance of peers. By systematically varying the number and type of learners and LOs in a DOELE, the simulation uncovers behaviours that would otherwise go unseen. The second experiment shows how to validate and tune a new instructional planner built on the EA, the Collaborative Filtering based on Learning Sequences planner (CFLS). When the CFLS planner is configured appropriately, simulated learners achieve higher performance measurements that those learners using the baseline planners. Simulation results lead to predictions that ultimately need to be proven in the real world, but even without real world validation such predictions can be useful to researchers to inform the ALT system design process. This thesis work shows that it is not necessary to model all the details of the real world to come to a better understanding of a pedagogical issue such as peer impact. And, simulation allowed for the design of the first known instructional planner to be based on usage data, the CFLS planner. The use of simulation for the design of EA-based systems opens new possibilities for instructional planning without knowledge engineering. Such systems can find niche learning paths that may have never been thought of by a human designer. By exploring pedagogical and ALT system design issues for DOELEs, this thesis shows that simulation is a valuable addition to the toolkit for ALT researchers

    Planning through Automatic Portfolio Configuration: The PbP Approach

    Get PDF
    In the field of domain-independent planning, several powerful planners implementing different techniques have been developed. However, no one of these systems outperforms all others in every known benchmark domain. In this work, we propose a multi-planner approach that automatically configures a portfolio of planning techniques for each given domain. The configuration process for a given domain uses a set of training instances to: (i) compute and analyze some alternative sets of macro-actions for each planner in the portfolio identifying a (possibly empty) useful set, (ii) select a cluster of planners, each one with the identified useful set of macro-actions, that is expected to perform best, and (iii) derive some additional information for configuring the execution scheduling of the selected planners at planning time. The resulting planning system, called PbP (Portfolio- based Planner), has two variants focusing on speed and plan quality. Different versions of PbP entered and won the learning track of the sixth and seventh International Planning Competitions. In this paper, we experimentally analyze PbP considering planning speed and plan quality in depth. We provide a collection of results that help to understand PbP�s behavior, and demonstrate the effectiveness of our approach to configuring a portfolio of planners with macro-actions

    Online Planner Selection with Graph Neural Networks and Adaptive Scheduling

    Get PDF
    Automated planning is one of the foundational areas of AI. Since no single planner can work well for all tasks and domains, portfolio-based techniques have become increasingly popular in recent years. In particular, deep learning emerges as a promising methodology for online planner selection. Owing to the recent development of structural graph representations of planning tasks, we propose a graph neural network (GNN) approach to selecting candidate planners. GNNs are advantageous over a straightforward alternative, the convolutional neural networks, in that they are invariant to node permutations and that they incorporate node labels for better inference. Additionally, for cost-optimal planning, we propose a two-stage adaptive scheduling method to further improve the likelihood that a given task is solved in time. The scheduler may switch at halftime to a different planner, conditioned on the observed performance of the first one. Experimental results validate the effectiveness of the proposed method against strong baselines, both deep learning and non-deep learning based. The code is available at \url{https://github.com/matenure/GNN_planner}.Comment: AAAI 2020. Code is released at https://github.com/matenure/GNN_planner. Data set is released at https://github.com/IBM/IPC-graph-dat
    • …
    corecore