Search CORE

2,778 research outputs found

BridgeHand2Vec Bridge Hand Representation

Author: Betley Jan
Duszak Piotr
Kołodziej Filip
Sztyber-Betley Anna
Publication venue
Publication date: 10/10/2023
Field of study

Contract bridge is a game characterized by incomplete information, posing an exciting challenge for artificial intelligence methods. This paper proposes the BridgeHand2Vec approach, which leverages a neural network to embed a bridge player's hand (consisting of 13 cards) into a vector space. The resulting representation reflects the strength of the hand in the game and enables interpretable distances to be determined between different hands. This representation is derived by training a neural network to estimate the number of tricks that a pair of players can take. In the remainder of this paper, we analyze the properties of the resulting vector space and provide examples of its application in reinforcement learning, and opening bid classification. Although this was not our main goal, the neural network used for the vectorization achieves SOTA results on the DDBP2 problem (estimating the number of tricks for two given hands)

arXiv.org e-Print Archive

Opponent Modelling in Multi-Agent Systems

Author: Tian Zheng
Publication venue: UCL (University College London)
Publication date: 28/11/2021
Field of study

Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

UCL Discovery

A neural network architecture for data editing in the Bank of ItalyÂ’s business surveys

Author: Claudia Biancotti
Leandro D'Aurizio
Raffaele Tartaglia-Polcini
Publication venue
Publication date
Field of study

This paper presents an application of neural network models to predictive classification for data quality control. Our aim is to identify data affected by measurement error in the Bank of ItalyÂ’s business surveys. We build an architecture consisting of three feed-forward networks for variables related to employment, sales and investment respectively: the networks are trained on input matrices extracted from the error-free final survey database for the 2003 wave, and subjected to stochastic transformations reproducing known error patterns. A binary indicator of unit perturbation is used as the output variable. The networks are trained with the Resilient Propagation learning algorithm. On the training and validation sets, correct predictions occur in about 90 per cent of the records for employment, 94 per cent for sales, and 75 per cent for investment. On independent test sets, the respective quotas average 92, 80 and 70 per cent. On our data, neural networks perform much better as classifiers than logistic regression, one of the most popular competing methods, on our data. They appear to provide a valid means of improving the efficiency of the quality control process and, ultimately, the reliability of survey data.data quality, data editing, binary classification, neural networks, measurement error

Research Papers in Economics

Towards interpreting recurrent neural networks through probabilistic abstraction

Author: Angluin Dana
Arthur David
Ayache Stéphane
Bahdanau Dzmitry
Bojarski Mariusz
Carlini Nicholas
Carrasco Rafael C
Chen Xinyun
Clarke Edmund M
D'Ulizia Arianna
Dalal Siddhartha R
Du Xiaoning
Ebrahimi Javid
Fawcett Tom
Gehr Timon
Gers Felix A.
Goodfellow Ian J.
Hou Bo-Jian
Huang Xiaowei
Jacobsson Henrik
Katz Guy
Lakkaraju Himabindu
Li Jinfeng
Lundberg Scott M
Ma Lei
Mao Hua
Mikolov Tomas
Oaksford Mike
Omlin Christian W
Pang Bo
Pei Kexin
Ron Dana
Ron Dana
Ruan Wenjie
Rudin Cynthia
Sammapun U
Sun Youcheng
Szegedy Christian
Tang Duyu
Wang Jingyi
Wang Jingyi
Wang Qinglong
Weiss Gail
Weiss Gail
Xu Rui
Yuan Zhenlong
Zhang Xiang
Zhou Zhi-Hua
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2020
Field of study

National Research Foundation (NRF) Singapore under its AI Singapore Programm

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Optimization for Decision Making II

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

In the current context of the electronic governance of society, both administrations and citizens are demanding the greater participation of all the actors involved in the decision-making process relative to the governance of society. This book presents collective works published in the recent Special Issue (SI) entitled “Optimization for Decision Making II”. These works give an appropriate response to the new challenges raised, the decision-making process can be done by applying different methods and tools, as well as using different objectives. In real-life problems, the formulation of decision-making problems and the application of optimization techniques to support decisions are particularly complex and a wide range of optimization techniques and methodologies are used to minimize risks, improve quality in making decisions or, in general, to solve problems. In addition, a sensitivity or robustness analysis should be done to validate/analyze the influence of uncertainty regarding decision-making. This book brings together a collection of inter-/multi-disciplinary works applied to the optimization of decision making in a coherent manner

Directory of Open Access Books (DOAB)

DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation

Author: Bai Haoyue
Chan S. -H. Gary
Hong Lanqing
Li Zhenguo
Sun Rui
Ye Han-Jia
Ye Nanyang
Zhou Fengwei
Publication venue
Publication date: 16/12/2020
Field of study

While deep learning demonstrates its strong ability to handle independent and identically distributed (IID) data, it often suffers from out-of-distribution (OoD) generalization, where the test data come from another distribution (w.r.t. the training one). Designing a general OoD generalization framework to a wide range of applications is challenging, mainly due to possible correlation shift and diversity shift in the real world. Most of the previous approaches can only solve one specific distribution shift, such as shift across domains or the extrapolation of correlation. To address that, we propose DecAug, a novel decomposed feature representation and semantic augmentation approach for OoD generalization. DecAug disentangles the category-related and context-related features. Category-related features contain causal information of the target object, while context-related features describe the attributes, styles, backgrounds, or scenes, causing distribution shifts between training and test data. The decomposition is achieved by orthogonalizing the two gradients (w.r.t. intermediate features) of losses for predicting category and context labels. Furthermore, we perform gradient-based augmentation on context-related features to improve the robustness of the learned representations. Experimental results show that DecAug outperforms other state-of-the-art methods on various OoD datasets, which is among the very few methods that can deal with different types of OoD generalization challenges.Comment: Accepted by AAAI202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Cyclic transfers in school timetabling

Author: Ahmadi Samad
Geertsema Frederik
Post Gerhard
Publication venue: Department of Applied Mathematics, University of Twente
Publication date: 01/01/2009
Field of study

In this paper we propose a neighbourhood structure based\ud on sequential/cyclic moves and a Cyclic Transfer algorithm for the high school timetabling problem. This method enables execution of complex moves for improving an existing solution, while dealing with the challenge of exploring the neighbourhood efficiently. An improvement graph is used in which certain negative cycles correspond to the neighbours; these cycles are explored using a recursive method. We address the problem of applying large neighbourhood structure methods on problems where the cost function is not exactly the sum of independent cost functions, as it is in the set partitioning problem. For computational experiments we use four real world datasets for high school timetabling in the Netherlands and England. We present results of the cyclic transfer algorithm with different settings on these datasets. The costs decrease by 8% to 28% if we use the cyclic transfers for local optimization compared to our initial solutions. The quality of the best initial solutions are comparable to the solutions found in practice by timetablers

Springer - Publisher Connector

De Montfort University Open Research Archive

University of Twente Research Information

Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning

Author: Habara Keigo
Ishii Shin
Kita Haruka
Koyamada Sotetsu
Murata Yu
Nishimori Soichiro
Okano Shinri
Publication venue
Publication date: 27/06/2023
Field of study

We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging auto-vectorization and Just-In-Time (JIT) compilation of JAX, Pgx can efficiently scale to thousands of parallel executions over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing Python RL libraries. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.Comment: 9 page

arXiv.org e-Print Archive