2,325 research outputs found

    BridgeHand2Vec Bridge Hand Representation

    Full text link
    Contract bridge is a game characterized by incomplete information, posing an exciting challenge for artificial intelligence methods. This paper proposes the BridgeHand2Vec approach, which leverages a neural network to embed a bridge player's hand (consisting of 13 cards) into a vector space. The resulting representation reflects the strength of the hand in the game and enables interpretable distances to be determined between different hands. This representation is derived by training a neural network to estimate the number of tricks that a pair of players can take. In the remainder of this paper, we analyze the properties of the resulting vector space and provide examples of its application in reinforcement learning, and opening bid classification. Although this was not our main goal, the neural network used for the vectorization achieves SOTA results on the DDBP2 problem (estimating the number of tricks for two given hands)

    Opponent Modelling in Multi-Agent Systems

    Get PDF
    Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

    A Deep Choice Model for Hiring Outcome Prediction in Online Labor Markets

    Get PDF
    A key challenge faced by online labor market researchers and practitioners is to understand how employers make hiring decisions from many job bidders with distinct attributes. This study investigates employer hiring behavior in one of the largest online labor markets by building a datadriven hiring decision prediction model. With the limitation of traditional discrete choice model (conditional logit model), we develop a novel deep choice model to simulate the hiring behavior from 722,339 job posts. The deep choice model extends the classical conditional logit model by learning a non-linear utility function identically for each bidder within of the job posts via a pointwise convolutional neural network. This non-linear mapping can be straightforwardly optimized using stochastic gradient approach. We test the model on 12 categories of job posts in the dataset. Results show that our deep choice model outperforms the linear-utility conditional logit model in predicting hiring preferences. By analyzing the model using dimensionality reduction and sensitivity analysis, we highlight the nonlinear combination of bidders’ features in impacting employers’ hiring decisions

    Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning

    Full text link
    We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging auto-vectorization and Just-In-Time (JIT) compilation of JAX, Pgx can efficiently scale to thousands of parallel executions over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing Python RL libraries. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.Comment: 9 page

    Bidding for B2B or B2G tenders: toward the adoption of pricing models in practice

    Get PDF
    Data availability The authors confirm that all data generated or analyzed are included in this published article. The data source for science mapping was Elsevier’s Scopus database. The search term for extracting the publication data is presented in Sect. 3 in Scopus syntax for replicability by other researchers. The same section also shows the parameterization of the SciMAT tool for transparency.This study investigates the lack of adoption of pricing models for tenders in business- to-business (B2B) and business-to-government (B2G) markets. We aim to identify the gaps between research and practice and propose a future research agenda to bridge these gaps. Our study contributes in three ways: First, we outline how our research agenda can influence the adoption of pricing models across specific practitioner roles in tendering. Second, we introduce systematic science mapping (SSM) as a novel methodology for literature reviews. SSM combines a systematic review and science mapping in a multi-stage, mixed-methods research design. We chart the evolution of 1042 research publications from 1956 to 2022 into three thematic areas. Our review of 163 gray literature publications reveals seven schools of thought on tender price modeling and the causes of theory-to-practice gaps. Finally, we introduce a new metric, the mapping factor (MAPF), as a robustness indicator for systematic literature reviews

    Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting

    Get PDF
    The most significant progress in recent years in online display advertising is what is known as the Real-Time Bidding (RTB) mechanism to buy and sell ads. RTB essentially facilitates buying an individual ad impression in real time while it is still being generated from a user’s visit. RTB not only scales up the buying process by aggregating a large amount of available inventories across publishers but, most importantly, enables direct targeting of individual users. As such, RTB has fundamentally changed the landscape of digital marketing. Scientifically, the demand for automation, integration and optimisation in RTB also brings new research opportunities in information retrieval, data mining, machine learning and other related fields. In this monograph, an overview is given of the fundamental infrastructure, algorithms, and technical solutions of this new frontier of computational advertising. The covered topics include user response prediction, bid landscape forecasting, bidding algorithms, revenue optimisation, statistical arbitrage, dynamic pricing, and ad fraud detection

    Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting

    Get PDF
    The most significant progress in recent years in online display advertising is what is known as the Real-Time Bidding (RTB) mechanism to buy and sell ads. RTB essentially facilitates buying an individual ad impression in real time while it is still being generated from a user’s visit. RTB not only scales up the buying process by aggregating a large amount of available inventories across publishers but, most importantly, enables direct targeting of individual users. As such, RTB has fundamentally changed the landscape of digital marketing. Scientifically, the demand for automation, integration and optimisation in RTB also brings new research opportunities in information retrieval, data mining, machine learning and other related fields. In this monograph, an overview is given of the fundamental infrastructure, algorithms, and technical solutions of this new frontier of computational advertising. The covered topics include user response prediction, bid landscape forecasting, bidding algorithms, revenue optimisation, statistical arbitrage, dynamic pricing, and ad fraud detection
    • …
    corecore