123 research outputs found

    Rational bidding using reinforcement learning: an application in automated resource allocation

    Get PDF
    The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized

    Q-Strategy: A Bidding Strategy for Market-Based Allocation of Grid Services

    Get PDF
    The application of autonomous agents by the provisioning and usage of computational services is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic service provisioning and usage of Grid services, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a bidding agent framework for implementing artificial bidding agents, supporting consumers and providers in technical and economic preference elicitation as well as automated bid generation by the requesting and provisioning of Grid services. Secondly, we introduce a novel consumer-side bidding strategy, which enables a goal-oriented and strategic behavior by the generation and submission of consumer service requests and selection of provider offers. Thirdly, we evaluate and compare the Q-strategy, implemented within the presented framework, against the Truth-Telling bidding strategy in three mechanisms – a centralized CDA, a decentralized on-line machine scheduling and a FIFO-scheduling mechanisms

    The transformation of the forest steppe in the lower Danube Plain of south-eastern Europe : 6000 years of vegetation and land use dynamics

    Get PDF
    Forest steppes are dynamic ecosystems, highly susceptible to changes in climate and land use. Here we examine the Holocene history of the European forest steppe ecotone in the Lower Danube Plain to better understand its sensitivity to climate fluctuations and human impact, and the timing of its transition into a cultural forest steppe. We used multi-proxy analyses (pollen, n-alkane, coprophilous fungi, charcoal, and geochemistry) of a 6000-year sequence from Lake Oltina (SE Romania), combined with a REVEALS model of quantitative vegetation cover. We found the greatest tree cover, composed of xerothermic (Carpinus orientalis and Quercus) and temperate (Carpinus betulus, Tilia, Ulmus and Fraxinus) tree taxa between 6000 and 2500 cal yr BP. Maximum tree cover (~ 50 %) occurred between 4200 and 2500 cal yr BP at a time of wetter climatic conditions. Compared to other European forest steppe areas, the dominance of Carpinus orientalis represents the most distinct feature of the woodland's composition during that time. Forest loss was under way by 2500 yr BP (Iron Age) with REVEALS estimates indicating a fall to ~ 20 % tree cover from the mid-Holocene forest maximum linked to clearance for agriculture, while climate conditions remained wet. Biomass burning increased markedly at 2500 cal yr BP suggesting that fire was regularly used as a management tool until 1000 cal yr BP when woody vegetation became scarce. A sparse tree cover, with only weak signs of forest recovery, then became a permanent characteristic of the Lower Danube Plain, highlighting recurring anthropogenic pressure. The timing of anthropogenic ecosystem transformation here (2500 cal yr BP) was in between that in central eastern (between 3700 and 3000 cal yr BP) and eastern (after 2000 cal yr BP) Europe. Our study is the first quantitative land cover estimate at the forest steppe ecotone in south eastern Europe spanning 6000 years and provides critical empirical evidence that the present-day forest steppe/woodlands reflects the potential natural vegetation in this region under current climate conditions. This study also highlights the potential of n-alkane indices for vegetation reconstruction, particularly in dry regions where pollen is poorly preserved

    Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

    Full text link
    A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with non-transitive interactions, a two agent and three agent systems of the rock-paper-scissors type are analyzed in detail, including the stability analysis and numerical simulation. Scale-free memory is demonstrated to cause complex dynamics of the systems at hand. In particular, it is shown that there can be simultaneously two modes of the system instability undergoing subcritical and supercritical bifurcation, with the latter one exhibiting anomalous oscillations with the amplitude and period growing with time. Besides, the instability onset via this supercritical mode may be regarded as "altruism self-organization". For the three agent system the instability dynamics is found to be rather irregular and can be composed of alternate fragments of oscillations different in their properties.Comment: 17 pages, 7 figur

    A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

    Full text link
    Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

    Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

    Get PDF
    Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations

    Building collaboration in multi-agent systems using reinforcement learning

    Get PDF
    © Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm

    The learning effect of intraoperative video-enhanced surgical procedure training

    Get PDF
    BACKGROUND: The transition from basic skills training in a skills lab to procedure training in the operating theater using the traditional master-apprentice model (MAM) lacks uniformity and efficiency. When the supervising surgeon performs parts of a procedure, training opportunities are lost. To minimize this intervention by the supervisor and maximize the actual operating time for the trainee, we created a new training method called INtraoperative Video-Enhanced Surgical Training (INVEST). METHODS: Ten surgical residents were trained in laparoscopic cholecystectomy either by the MAM or with INVEST. Each trainee performed six cholecystectomies that were objectively evaluated on an Objective Structured Assessment of Technical Skills (OSATS) global rating scale. Absolute and relative improvements during the training curriculum were compared between the groups. A questionnaire evaluated the trainee's opinion on this new training method. RESULTS: Skill improvement on the OSATS global rating scale was significantly greater for the trainees in the INVEST curriculum compared to the MAM, with mean absolute improvement 32.6 versus 14.0 points and mean relative improvement 59.1 versus 34.6% (P = 0.02). CONCLUSION: INVEST significantly enhances technical and procedural skill development during the early learning curve for laparoscopic cholecystectomy. Trainees were positive about the content and the idea of the curriculum
    corecore