Search CORE

255 research outputs found

Strategically-Timed Actions in Stochastic Differential Games

Author: Mguni David H.
Publication venue: UCL (University College London)
Publication date: 28/08/2020
Field of study

Financial systems are rich in interactions amenable to description by stochastic control theory. Optimal stochastic control theory is an elegant mathematical framework in which a controller, profitably alters the dynamics of a stochastic system by exercising costly control inputs. If the system includes more than one agent, the appropriate modelling framework is stochastic differential game theory — a multiplayer generalisation of stochastic control theory. There are numerous environments in which financial agents incur fixed minimal costs when adjusting their investment positions; trading environments with transaction costs and real options pricing are important examples. The presence of fixed minimal adjustment costs produces adjustment stickiness as agents now enact their investment adjustments over a sequence of discrete points. Despite the fundamental relevance of adjustment stickiness within economic theory, in stochastic differential game theory, the set of players’ modifications to the system dynamics is mainly restricted to a continuous class of controls. Under this assumption, players modify their positions through infinitesimally fine adjustments over the problem horizon. This renders such models unsuitable for modelling systems with fixed minimal adjustment costs. To this end, we present a detailed study of strategic interactions with fixed minimal adjustment costs. We perform a comprehensive study of a new stochastic differential game of impulse control and stopping on a jump-diffusion process and, conduct a detailed investigation of two-player impulse control stochastic differential games. We establish the existence of a value of the games and show that the value is a unique (viscosity) solution to a double obstacle problem which is characterised in terms of a solution to a non-linear partial differential equation (PDE). The study is contextualised within two new models of investment that tackle a dynamic duopoly investment problem and an optimal liquidity control and lifetime ruin problem. It is then shown that each optimal investment strategy can be recovered from the equilibrium strategies of the corresponding stochastic differential game. Lastly, we introduce a dynamic principal-agent model with a self-interested agent that faces minimally bounded adjustment costs. For this setting, we show for the first time that the principal can sufficiently distort that agent’s preferences so that the agent finds it optimal to execute policies that maximise the principal’s payoff in the presence of fixed minimal costs

UCL Discovery

All Language Models Large and Small

Author: Chen Zhixun
Du Yali
Mguni David
Publication venue
Publication date: 05/06/2024
Field of study

Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others. We introduce a novel plug-and-play LM framework named Language Optimising Network Distribution (LONDI) framework. LONDI learns to selectively employ large LMs only where complex decision-making and reasoning are required while using low-resource LMs (i.e. LMs require less GPU usage, but may not be able to solve the problem alone) everywhere else. LONDI consists of a system of two (off-)policy networks, an LM, a large LM (LLM), and a reinforcement learning module that uses switching controls to quickly learn which system states to call the LLM. We then introduce a variant of LONDI that maintains budget constraints on LLM calls and hence its resource usage. Theoretically, we prove LONDI learns the subset of system states to activate the LLM required to solve the task. We then prove that LONDI converges to optimal solutions while also preserving budgetary constraints on LLM calls almost surely enabling it to solve various tasks while significantly lowering computational costs. We test LONDI's performance in a range of tasks in ScienceWorld and BabyAI-Text and demonstrate that LONDI can solve tasks only solvable by resource-intensive LLMs while reducing GPU usage by up to 30%

arXiv.org e-Print Archive

On the complexity of computing Markov perfect equilibrium in general-sum stochastic games

Author: Deng X
Li N
Mguni D
Mguni D
Wang J
Yang Y
Publication venue: OXFORD UNIV PRESS
Publication date: 22/11/2022
Field of study

Similar to the role of Markov decision processes in reinforcement learning, Markov games (also called stochastic games) lay down the foundation for the study of multi-agent reinforcement learning and sequential agent interactions. We introduce approximate Markov perfect equilibrium as a solution to the computational problem of finite-state stochastic games repeated in the infinite horizon and prove its PPAD-completeness. This solution concept preserves the Markov perfect property and opens up the possibility for the success of multi-agent reinforcement learning algorithms on static two-player games to be extended to multi-agent dynamic games, expanding the reign of the PPAD-complete class

UCL Discovery

PubMed Central

Queen Mary Research Online

The influence of clay diagenesis on the petrophysical properties of sandstone reservoirs in the Pletmos Basin Offshore South Africa

Author: Mguni Nothando
Publication venue: University of Western Cape
Publication date: 01/01/2020
Field of study

>Magister Scientiae - MScPletmos Basin is a Mesozoic half graben located in the southern part of South Africa and has undergone numerous tectonic changes which involve alteration of structure and reworking of sediments. Clay diagenesis has become a more prominent factor affecting the quality of the tight shaly sandstone reservoirs in the southern Pletmos Basin. The present study focused on Block 11a as a primary area of interest .The tight sandstone reservoirs encountered in the four wells, viz. Ga-Q1, Ga- Q2, Ga-Z1 and Ga- E2 were studied using four different methods to incorporate and infer the overall diagenetic effect on the reservoirs, caused by materials of argillaceous origin. The methods adopted in the present research are formation evaluation using wireline logs and calibration of core data using Interactive Petrophysics software, thin section petrography, X-Ray Diffraction (XRD) and, Scanning Electron Microscope (SEM) along with Energy Dispersive X-Ray Spectroscopy (EDS). The availability of core samples were limited to wells Ga- Q1 and well Ga- Z1. Four reservoirs within the Cretaceous age were identified in each well and the best reservoirs were associated with facies B and D.2022-04-3

UWC Theses and Dissertations