33 research outputs found
Reinforcement Learning Algorithms and Complexity of Inventory Control, A Review
Driven by the ability to perform sequential decision-making in complex dynamic situations, Reinforcement Learning (RL) has quickly become a promising avenue to solve inventory control (IC) problems. The objective of this paper is to provide a comprehensive overview of the IC problems that have been effectively solved due to the application of RL. Our contributions include providing the first systematic review in this field of interest and application. We also identify potential extensions and come up with four propositions that formulate a theoretical framework that may help develop RL algorithms to solve complex IC problems. We recommend specific future research directions and novel approaches in solving IC problems
A Simulation Environment and Reinforcement Learning Method for Waste Reduction
In retail (e.g., grocery stores, apparel shops, online retailers), inventory managers have to balance short-term risk (no items to sell) with long-term-risk (over ordering leading to product waste). This balancing task is made especially hard due to the lack of information about future customer purchases. In this paper, we study the problem of restocking a grocery store’s inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent’s actions, making the environment partially observable. We make two main contributions. First, we introduce a new reinforcement learning environment, RetaiL, based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. Second, we introduce GTDQN, a distributional reinforcement learning algorithm that learns a generalized lambda distribution over the reward space. GTDQN provides a strong baseline for our environment. It outperforms other distributional reinforcement learning approaches in this partially observable setting, in both overall reward and generated waste
A Simulation Environment and Reinforcement Learning Method for Waste Reduction
In retail (e.g., grocery stores, apparel shops, online retailers), inventory managers have to balance short-term risk (no items to sell) with long-term-risk (over ordering leading to product waste). This balancing task is made especially hard due to the lack of information about future customer purchases. In this paper, we study the problem of restocking a grocery store’s inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent’s actions, making the environment partially observable. We make two main contributions. First, we introduce a new reinforcement learning environment, RetaiL, based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. Second, we introduce GTDQN, a distributional reinforcement learning algorithm that learns a generalized lambda distribution over the reward space. GTDQN provides a strong baseline for our environment. It outperforms other distributional reinforcement learning approaches in this partially observable setting, in both overall reward and generated waste
Applications of Machine Learning in Supply Chains
Advances in new technologies have resulted in increasing the speed of data generation and accessing larger data storage. The availability of huge datasets and massive computational power have resulted in the emergence of new algorithms in artificial intelligence and specifically machine learning, with significant research done in fields like computer vision. Although the same amount of data exists in most components of supply chains, there is not much research to utilize the power of raw data to improve efficiency in supply chains.In this dissertation our objective is to propose data-driven non-parametric machine learning algorithms to solve different supply chain problems in data-rich environments.Among wide range of supply chain problems, inventory management has been one of the main challenges in every supply chain. The ability to manage inventories to maximize the service level while minimizing holding costs is a goal of many company. An unbalanced inventory system can easily result in a stopped production line, back-ordered demands, lost sales, and huge extra costs. This dissertation studies three problems and proposes machine learning algorithms to help inventory managers reduce their inventory costs.In the first problem, we consider the newsvendor problem in which an inventory manager needs to determine the order quantity of a perishable product to minimize the sum of shortage and holding costs, while some feature information is available for each product. We propose a neural network approach with a specialized loss function to solve this problem. The neural network gets historical data and is trained to provide the order quantity. We show that our approach works better than the classical separated estimation and optimization approaches as well as other machine learning based algorithms. Especially when the historical data is noisy, and there is little data for each combination of features, our approach works much better than other approaches. Also, to show how this approach can be used in other common inventory policies, we apply it on an policy and provide the results.This algorithm allows inventory managers to quickly determine an order quantity without obtaining the underling demand distribution.Now, assume the order quantities or safety stock levels are obtained for a single or multi-echelon system. Classical inventory optimization models work well in normal conditions, or in other words when all underlying assumptions are valid. Once one of the assumptions or the normal working conditions is violated, unplanned stock-outs or excess inventories arise.To address this issue, in the second problem, a multi-echelon supply network is considered, and the goal is to determine the nodes that might face a stock-out in the next period. Stock-outs are usually expensive and inventory managers try to avoid them, so stock-out prediction might results in averting stock-outs and the corresponding costs.In order to provide such predictions, we propose a neural network model and additionally three naive algorithms. We analyze the performance of the proposed algorithms by comparing them with classical forecasting algorithms and a linear regression model, over five network topologies. Numerical results show that the neural network model is quite accurate and obtains accuracies in for the hardest to easiest network topologies, with average of 0.950 and standard deviation of 0.023, while the closest competitor, i.e., one of the proposed naive algorithms, obtains accuracies in with average of 9.26 and standard deviation of .0136. Additionally, we suggest conditions under which each algorithm is the most reliable and additionally apply all algorithms to threshold and multi-period predictions.Although stock-out prediction can be very useful, any inventory manager would like to have a powerful model to optimize the inventory system and balance the holding and shortage costs. The literature on multi-echelon inventory models is quite rich, though it mostly relies on the assumption of accessing a known demand distribution. The demand distribution can be approximated, but even so, in some cases a globally optimal model is not available.In the third problem, we develop a machine learning algorithm to address this issue for multi-period inventory optimization problems in multi-echelon networks. We consider the well-known beer game problem and propose a reinforcement learning algorithm to efficiently learn ordering policies from data.The beer game is a serial supply chain with four agents, i.e. retailer, wholesaler, distributor, and manufacturer, in which each agent replenishes its stock by ordering beer from its predecessor. The retailer satisfies the demand of external customers, and the manufacturer orders from external suppliers. Each of the agents must decide its own order quantity to minimize the summation of holding and shortage cost of the system, while they are not allowed to share any information with other agents. For this setting, a base-stock policy is optimal, if the retailer is the only node with a positive shortage cost and a known demand distribution is available. Outside of this narrow condition, there is not a known optimal policy for this game. Also, from the game theory point of view, the beer game can be modeled as a decentralized multi-agent cooperative problem with partial observability, which is known as a NEXP-complete problem.We propose an extension of deep Q-network for making decisions about order quantities in a single node of the beer game. When the co-players follow a rational policy, it obtains a close-to-optimal solution, and it works much better than a base-stock policy if the other agents play irrationally. Additionally, to reduce the training time of the algorithm, we propose using transfer learning, which reduces the training time by one order of magnitude. This approach can be extended to other inventory optimization and supply chain problems
Recommended from our members
A collaborative framework for feasibility analysis in automotive product development with global supply chain
In the competitive world, time to market, new technology and innovation are the measures of the performance of New Product Development (NPD). Companies tend to use a conventional approach to NPD by assigning representatives from their own support functions to review and recommend changes as projects evolve. In recent years, this approach has been questioned since it is a costly and time-consuming approach due to its iterative nature. Researchers argue that the time to market process and the cost of NPD can be reduced considerably by involving the support functions of the supply chain to a greater extent and also earlier in the NPD process. There is a potential industrial requirement for a collaborative framework that facilitates the linkage between Supply Chain Management (SCM) and New Product Development (NPD).
This research project focuses on the early stages of the collaborative product development process in the extended enterprise. The research output includes the functional requirements of a framework and a developed prototype methodology with tools and technologies that are tested from case studies within industry. The research also introduces the development and analysis of the framework that allows the integration of the flow of product development related activities within original equipment manufacturers (OEM) and suppliers providing future business benefits. An industrial investigation of an OEM in the automotive industry within the research identified that there are different decision making points in product development and manufacturing. The proposed methodology and framework use key drives to predict and quantify its impact on four main criteria namely: feasibility, time, cost and capability that support or advise on key decision making of OEM’s product development and management process
Recommended from our members
An Investigation into the Effect that Technology had on the Strategies of J. Sainsbury plc, Tesco plc and Safeway plc: With a Particular Focus on the Period 1980 to 1990
This research is focused on three food multiple retailers, Sainsbury plc, Tesco plc, and Safeway plc. The research is designed to explore the relationship between technology and strategy in these organisations. The currently held view among the researchers and managers of these organisations is that technology has a limited impact on the processes that formulate strategy, and as such may be regarded as having an enabling role. This thesis proposes that while this view may have been correct in the past it is so no longer, and that technology is not following strategy but leading strategy in the food retailers examined.
In order to confirm this thesis the history, technical development and technical structure of the three retailers was investigated. The results of this research was subsequently analysed and the following conclusions were made:
a. Technology has a much greater inq)act on the strategy of multiple food retailers than has been previously thought. Technology defines the boundaries of operational activities, and, through controlling a substantial proportion of the information that managers use in the strategy making process, technology de facto if not de jure greatly influences the retailers strategies, and in some cases may actually lead them.
b. The food multiples, in not appreciating the extent to which their fate is tied up with the information technology they are using, are foiling to educate and train the general management of the organisations technologically.
c. Technological progress is widening the gap between the general management and technical management, and in the long run this will cause serious strategic problems unless this gap is closed through positive action