2,034 research outputs found

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    A POWER INDEX BASED FRAMEWORKFOR FEATURE SELECTION PROBLEMS

    Get PDF
    One of the most challenging tasks in the Machine Learning context is the feature selection. It consists in selecting the best set of features to use in the training and prediction processes. There are several benefits from pruning the set of actually operational features: the consequent reduction of the computation time, often a better quality of the prediction, the possibility to use less data to create a good predictor. In its most common form, the problem is called single-view feature selection problem, to distinguish it from the feature selection task in Multi-view learning. In the latter, each view corresponds to a set of features and one would like to enact feature selection on each view, subject to some global constraints. A related problem in the context of Multi-View Learning, is Feature Partitioning: it consists in splitting the set of features of a single large view into two or more views so that it becomes possible to create a good predictor based on each view. In this case, the best features must be distributed between the views, each view should contain synergistic features, while features that interfere disruptively must be placed in different views. In the semi-supervised multi-view task known as Co-training, one requires also that each predictor trained on an individual view is able to teach something to the other views: in classification tasks for instance, one view should learn to classify unlabelled examples based on the guess provided by the other views. There are several ways to address these problems. A set of techniques is inspired by Coalitional Game Theory. Such theory defines several useful concepts, among which two are of high practical importance: the concept of power index and the concept of interaction index. When used in the context of feature selection, they take the following meaning: the power index is a (context-dependent) synthesis measure of the prediction\u2019s capability of a feature, the interaction index is a (context-dependent) synthesis measure of the interaction (constructive/disruptive interference) between two features: it can be used to quantify how the collaboration between two features enhances their prediction capabilities. An important point is that the powerindex of a feature is different from the predicting power of the feature in isolation: it takes into account, by a suitable averaging, the context, i.e. the fact that the feature is acting, together with other features, to train a model. Similarly, the interaction index between two features takes into account the context, by suitably averaging the interaction with all the other features. In this work we address both the single-view and the multi-view problems as follows. The single-view feature selection problem, is formalized as the problem of maximization of a pseudo-boolean function, i.e. a real valued set function (that maps sets of features into a performance metric). Since one has to enact a search over (a considerable portion of) the Boolean lattice (without any special guarantees, except, perhaps, positivity) the problem is in general NP-hard. We address the problem producing candidate maximum coalitions through the selection of the subset of features characterized by the highest power indices and using the coalition to approximate the actual maximum. Although the exact computation of the power indices is an exponential task, the estimates of the power indices for the purposes of the present problem can be achieved in polynomial time. The multi-view feature selection problem is formalized as the generalization of the above set-up to the case of multi-variable pseudo-boolean functions. The multi-view splitting problem is formalized instead as the problem of maximization of a real function defined over the partition lattice. Also this problem is typically NP-hard. However, candidate solutions can be found by suitably partitioning the top power-index features and keeping in different views the pairs of features that are less interactive or negatively interactive. The sum of the power indices of the participating features can be used to approximate the prediction capability of the view (i.e. they can be used as a proxy for the predicting power). The sum of the feature pair interactivity across views can be used as proxy for the orthogonality of the views. Also the capability of a view to pass information (to teach) to other views, within a co-training procedure can benefit from the use of power indices based on a suitable definition of information transfer (a set of features { a coalition { classifies examples that are subsequently used in the training of a second set of features). As to the feature selection task, not only we demonstrate the use of state of the art power index concepts (e.g. Shapley Value and Banzhaf along the 2lines described above Value), but we define new power indices, within the more general class of probabilistic power indices, that contains the Shapley and the Banzhaf Values as special cases. Since the number of features to select is often a predefined parameter of the problem, we also introduce some novel power indices, namely k-Power Index (and its specializations k-Shapley Value, k-Banzhaf Value): they help selecting the features in a more efficient way. For the feature partitioning, we use the more general class of probabilistic interaction indices that contains the Shapley and Banzhaf Interaction Indices as members. We also address the problem of evaluating the teaching ability of a view, introducing a suitable teaching capability index. The last contribution of the present work consists in comparing the Game Theory approach to the classical Greedy Forward Selection approach for feature selection. In the latter the candidate is obtained by aggregating one feature at time to the current maximal coalition, by choosing always the feature with the maximal marginal contribution. In this case we show that in typical cases the two methods are complementary, and that when used in conjunction they reduce one another error in the estimate of the maximum value. Moreover, the approach based on game theory has two advantages: it samples the space of all possible features\u2019 subsets, while the greedy algorithm scans a selected subspace excluding totally the rest of it, and it is able, for each feature, to assign a score that describes a context-aware measure of importance in the prediction process

    Approximating the Shapley Value without Marginal Contributions

    Full text link
    The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods

    An Approximate "Law of One Price" in Random Assignment Games

    Full text link
    Assignment games represent a tractable yet versatile model of two-sided markets with transfers. We study the likely properties of the core of randomly generated assignment games. If the joint productivities of every firm and worker are i.i.d bounded random variables, then with high probability all workers are paid roughly equal wages, and all firms make similar profits. This implies that core allocations vary significantly in balanced markets, but that there is core convergence in even slightly unbalanced markets. For the benchmark case of uniform distribution, we provide a tight bound for the workers' share of the surplus under the firm-optimal core allocation. We present simulation results suggesting that the phenomena analyzed appear even in medium-sized markets. Finally, we briefly discuss the effects of unbounded distributions and the ways in which they may affect wage dispersion

    Hedonic Coalition Formation for Task Allocation with Heterogeneous Robots

    Get PDF
    Tasks in the real world are complex in nature and often require multiple robots to collaborate in order to be accomplished. However, multiple robots with the same set of sensors working together might not be the optimal solution. In many cases a task might require different sensory inputs and outputs. However, allocating a large variety of sensors on each robot is not a cost-effective solution. As such, robots with different attributes must be considered. In this thesis we study the coalition formation problem for task allocation with multiple heterogeneous (equipped with a different set of sensors) robots. The proposed solution is implemented utilizing a Hedonic Coalition Formation strategy, rooted in game theory, coupled with bipartite graph matching. Our proposed algorithm aims to minimize the total cost of the formed coalitions and to maximize the matching between the required and the allocated types of robots to the tasks. Simulation results show that it produces near-optimal solutions (up to 94%) in a negligible amount of time (0:19 ms. with 100 robots and 10 tasks)

    Risk-Aware Planning for Sensor Data Collection

    Get PDF
    With the emergence of low-cost unmanned air vehicles, civilian and military organizations are quickly identifying new applications for affordable, large-scale collectives to support and augment human efforts via sensor data collection. In order to be viable, these collectives must be resilient to the risk and uncertainty of operating in real-world environments. Previous work in multi-agent planning has avoided planning for the loss of agents in environments with risk. In contrast, this dissertation presents a problem formulation that includes the risk of losing agents, the effect of those losses on the mission being executed, and provides anticipatory planning algorithms that consider risk. We conduct a thorough analysis of the effects of risk on path-based planning, motivating new solution methods. We then use hierarchical clustering to generate risk-aware plans for a variable number of agents, outperforming traditional planning methods. Next, we provide a mechanism for distributed negotiation of stable plans, utilizing coalitional game theory to provide cost allocation methods that we prove to be fair and stable. Centralized planning with redundancy is then explored, planning for parallel task completion to mitigate risk and provide further increased expected value. Finally, we explore the role of cost uncertainty as additional source of risk, using bi-objective optimization to generate sets of alternative plans. We demonstrate the capability of our algorithms on randomly generated problem instances, showing an improvement over traditional multi-agent planning methods as high as 500% on very large problem instances
    • …
    corecore