190 research outputs found
Global Optimality Guarantees For Policy Gradient Methods
Policy gradients methods apply to complex, poorly understood, control
problems by performing stochastic gradient descent over a parameterized class
of polices. Unfortunately, even for simple control problems solvable by
standard dynamic programming techniques, policy gradient algorithms face
non-convex optimization problems and are widely understood to converge only to
a stationary point. This work identifies structural properties -- shared by
several classic control problems -- that ensure the policy gradient objective
function has no suboptimal stationary points despite being non-convex. When
these conditions are strengthened, this objective satisfies a
Polyak-lojasiewicz (gradient dominance) condition that yields convergence
rates. We also provide bounds on the optimality gap of any stationary point
when some of these conditions are relaxed
Behavioral analyses of retailers’ ordering decisions
The main objective I pursue in this thesis is to better understand how different factors may independently and in combination influence retailers' ordering decisions under different supply chain structures (single agent and multi agent), different demand uncertainty (deterministic and stochastic), and different interaction among retailers (no interaction, competition and cooperation). I developed three different studies where I build on different formal management model and then run multiple behavioral studies to better understand subjects’ behavior. The first study analyzes order amplification in a single-supplier single-retailer supply chain. I used a behavioral experiment to test retailers’ orders under different ordering delays and different times to build supplier’s capacity. Results provide (i) a better understanding of the endogenous dynamics leading to retailers’ ordering amplification, and (ii) a description of subjects’ biases and deviation from optimal trajectories; despite subjects have full information about the system structure. The second study analyzes how order amplification can also take place when there is fierce retailer competition and limited supplier capacity. I study how different factors (different time to build supplier capacity, different levels of competition among retailers, different magnitudes of supply shortage and different allocation mechanisms) may independently and in combination influence retailers’ order in a system with two retailers under supply competition. Results show that (i) the bullwhip effect persists even when subjects do not have incentives to deviate, (ii) subjects amplify their orders in an attempt to build an unnecessary safety stock to respond to potential deviations from the other retailers, and (iii) retailers’ underperformance varies with the allocation mechanism used by the supplier. In the last study, I analyze retailers’ orders in a system where there is uncertainty in the final customer demand. I experimentally explore the effect of transshipments among retailers in a single-supplier multi-retailer supply chain. Specifically, I explore retailers’ orders under different profit and communication conditions. In addition, I integrate analytical and behavioral models to improve supply chain performance. Results show that (i) the persistence of common biases in a newsvendor problem (pull-to-center, demand chasing, loss aversion, psychological disutility), (ii) communication could improve coordination and may reduce demand chasing behavior, (iii) supply chain performance increases with the use of behavioral strategies embedded within a traditional optimization model, and (iv) dynamic heuristics improve overall coordination, outperforming a simple Nash Equilibrium strategy
Recommended from our members
Decision Making with Coupled Learning: Applications in Inventory Management and Auctions
Operational decisions can be complicated by the presence of uncertainty. In many cases, there exist means to reduce uncertainty, though these may come at a cost. Decision makers then face the dilemma of acting based on current, incomplete information versus investing in trying to minimize uncertainty. Understanding the impact of this trade-off on decisions and performance is the central topic of this thesis.
When attempting to construct probabilistic models based on data, operational decisions often affect the amount and quality of data that is collected. This introduces an exploration-exploitation trade-off between decisions and information collection. Much of the literature has sought to understand how operational decisions should be modified to incorporate this trade-off. While studying two well-known operational problems, we ask an even more basic question: does the exploration-exploitation trade-off matter in the first place? In the first two parts of this thesis we focus on this question in the context of the newsvendor problem and sequential auctions with incomplete private information.
We first analyze the well-studied stationary multi-period newsvendor problem, in which a retailer sells perishable items and unmet demand is lost and unobserved. This latter limitation, referred to as demand censoring, is what introduces the exploration-exploitation trade-off in this problem. We focus on two questions: i.) what is the value of accounting for the exploration-exploitation trade-off; and, ii.) what is the cost imposed by having access only to sales data as opposed to underlying demand samples? Quite remarkably, we show that, for a broad family of tractable cases, there is essentially no exploration-exploitation trade-off; i.e., there is almost no value of accounting for the impact of decisions on information collection. Moreover, we establish that losses due to demand censoring (as compared to having full access to demand samples) are limited, but these are of higher order than those due to ignoring the exploration-exploitation trade-off. In other words, efforts aimed at improving information collection concerning lost sales are more valuable than analytic or computational efforts to pin down the optimal policy in the presence of censoring.
In the second part of this thesis we examine the problem of an agent bidding on a sequence of repeated auctions for an item. The agent does not fully know his own valuation of the object and he can only collect information if he wins an auction. This coupling introduces the exploration-exploitation trade-off in this problem. We study the value of accounting for information collection on decisions and find that: i.) in general the exploration-exploitation trade-off cannot be ignored (that is, in some cases ignoring exploration can substantially affect rewards), but ii.) for a broad class of instances, ignoring exploration can indeed produce nearly optimal results. We characterize this class through a set of conditions on the problem primitives, and we demonstrate with examples that these are satisfied for common settings found in the literature.
In the third part of this thesis we study the impact of uncertainty in the context of inventory record inaccuracies in inventory management systems. Record inaccuracies, mismatches between physical and recorded inventory, are frequently encountered in practice and can markedly affect revenues. Most of the literature is devoted to analyzing the cost-benefit relationship between investing in means to reduce inaccuracies and accounting for them in operational decisions. We focus on the less explored approach of using available data to reduce the uncertainty in inventory. In practice, collecting Point Of Sale (POS) data is substantially simpler than collecting stock information. We propose a model in which inventory is regarded as a virtually unobservable quantity and POS data is used to infer its state over time. Additionally, our method also works as an effective estimator of censored demand in the presence of inaccurate records. We test our methodology with extensive numerical experiments based on both simulated and actual retailing data. The results show that it is remarkably effective in inferring unobservable past statistics and predicting future stock status, even in the presence of severe data misspecifications
Applications of Machine Learning in Supply Chains
Advances in new technologies have resulted in increasing the speed of data generation and accessing larger data storage. The availability of huge datasets and massive computational power have resulted in the emergence of new algorithms in artificial intelligence and specifically machine learning, with significant research done in fields like computer vision. Although the same amount of data exists in most components of supply chains, there is not much research to utilize the power of raw data to improve efficiency in supply chains.In this dissertation our objective is to propose data-driven non-parametric machine learning algorithms to solve different supply chain problems in data-rich environments.Among wide range of supply chain problems, inventory management has been one of the main challenges in every supply chain. The ability to manage inventories to maximize the service level while minimizing holding costs is a goal of many company. An unbalanced inventory system can easily result in a stopped production line, back-ordered demands, lost sales, and huge extra costs. This dissertation studies three problems and proposes machine learning algorithms to help inventory managers reduce their inventory costs.In the first problem, we consider the newsvendor problem in which an inventory manager needs to determine the order quantity of a perishable product to minimize the sum of shortage and holding costs, while some feature information is available for each product. We propose a neural network approach with a specialized loss function to solve this problem. The neural network gets historical data and is trained to provide the order quantity. We show that our approach works better than the classical separated estimation and optimization approaches as well as other machine learning based algorithms. Especially when the historical data is noisy, and there is little data for each combination of features, our approach works much better than other approaches. Also, to show how this approach can be used in other common inventory policies, we apply it on an policy and provide the results.This algorithm allows inventory managers to quickly determine an order quantity without obtaining the underling demand distribution.Now, assume the order quantities or safety stock levels are obtained for a single or multi-echelon system. Classical inventory optimization models work well in normal conditions, or in other words when all underlying assumptions are valid. Once one of the assumptions or the normal working conditions is violated, unplanned stock-outs or excess inventories arise.To address this issue, in the second problem, a multi-echelon supply network is considered, and the goal is to determine the nodes that might face a stock-out in the next period. Stock-outs are usually expensive and inventory managers try to avoid them, so stock-out prediction might results in averting stock-outs and the corresponding costs.In order to provide such predictions, we propose a neural network model and additionally three naive algorithms. We analyze the performance of the proposed algorithms by comparing them with classical forecasting algorithms and a linear regression model, over five network topologies. Numerical results show that the neural network model is quite accurate and obtains accuracies in for the hardest to easiest network topologies, with average of 0.950 and standard deviation of 0.023, while the closest competitor, i.e., one of the proposed naive algorithms, obtains accuracies in with average of 9.26 and standard deviation of .0136. Additionally, we suggest conditions under which each algorithm is the most reliable and additionally apply all algorithms to threshold and multi-period predictions.Although stock-out prediction can be very useful, any inventory manager would like to have a powerful model to optimize the inventory system and balance the holding and shortage costs. The literature on multi-echelon inventory models is quite rich, though it mostly relies on the assumption of accessing a known demand distribution. The demand distribution can be approximated, but even so, in some cases a globally optimal model is not available.In the third problem, we develop a machine learning algorithm to address this issue for multi-period inventory optimization problems in multi-echelon networks. We consider the well-known beer game problem and propose a reinforcement learning algorithm to efficiently learn ordering policies from data.The beer game is a serial supply chain with four agents, i.e. retailer, wholesaler, distributor, and manufacturer, in which each agent replenishes its stock by ordering beer from its predecessor. The retailer satisfies the demand of external customers, and the manufacturer orders from external suppliers. Each of the agents must decide its own order quantity to minimize the summation of holding and shortage cost of the system, while they are not allowed to share any information with other agents. For this setting, a base-stock policy is optimal, if the retailer is the only node with a positive shortage cost and a known demand distribution is available. Outside of this narrow condition, there is not a known optimal policy for this game. Also, from the game theory point of view, the beer game can be modeled as a decentralized multi-agent cooperative problem with partial observability, which is known as a NEXP-complete problem.We propose an extension of deep Q-network for making decisions about order quantities in a single node of the beer game. When the co-players follow a rational policy, it obtains a close-to-optimal solution, and it works much better than a base-stock policy if the other agents play irrationally. Additionally, to reduce the training time of the algorithm, we propose using transfer learning, which reduces the training time by one order of magnitude. This approach can be extended to other inventory optimization and supply chain problems
Learning an Inventory Control Policy with General Inventory Arrival Dynamics
In this paper we address the problem of learning and backtesting inventory
control policies in the presence of general arrival dynamics -- which we term
as a quantity-over-time arrivals model (QOT). We also allow for order
quantities to be modified as a post-processing step to meet vendor constraints
such as order minimum and batch size constraints -- a common practice in real
supply chains. To the best of our knowledge this is the first work to handle
either arbitrary arrival dynamics or an arbitrary downstream post-processing of
order quantities. Building upon recent work (Madeka et al., 2022) we similarly
formulate the periodic review inventory control problem as an exogenous
decision process, where most of the state is outside the control of the agent.
Madeka et al. (2022) show how to construct a simulator that replays historic
data to solve this class of problem. In our case, we incorporate a deep
generative model for the arrivals process as part of the history replay. By
formulating the problem as an exogenous decision process, we can apply results
from Madeka et al. (2022) to obtain a reduction to supervised learning.
Finally, we show via simulation studies that this approach yields statistically
significant improvements in profitability over production baselines. Using data
from an ongoing real-world A/B test, we show that Gen-QOT generalizes well to
off-policy data
- …