2,964 research outputs found

    Dynamic Assortment Optimization with Changing Contextual Information

    Full text link
    In this paper, we study the dynamic assortment optimization problem under a finite selling season of length TT. At each time period, the seller offers an arriving customer an assortment of substitutable products under a cardinality constraint, and the customer makes the purchase among offered products according to a discrete choice model. Most existing work associates each product with a real-valued fixed mean utility and assumes a multinomial logit choice (MNL) model. In many practical applications, feature/contexutal information of products is readily available. In this paper, we incorporate the feature information by assuming a linear relationship between the mean utility and the feature. In addition, we allow the feature information of products to change over time so that the underlying choice model can also be non-stationary. To solve the dynamic assortment optimization under this changing contextual MNL model, we need to simultaneously learn the underlying unknown coefficient and makes the decision on the assortment. To this end, we develop an upper confidence bound (UCB) based policy and establish the regret bound on the order of O~(dT)\widetilde O(d\sqrt{T}), where dd is the dimension of the feature and O~\widetilde O suppresses logarithmic dependence. We further established the lower bound Ω(dT/K)\Omega(d\sqrt{T}/K) where KK is the cardinality constraint of an offered assortment, which is usually small. When KK is a constant, our policy is optimal up to logarithmic factors. In the exploitation phase of the UCB algorithm, we need to solve a combinatorial optimization for assortment optimization based on the learned information. We further develop an approximation algorithm and an efficient greedy heuristic. The effectiveness of the proposed policy is further demonstrated by our numerical studies.Comment: 4 pages, 4 figures. Minor revision and polishing of presentatio

    A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

    Full text link
    In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes their response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describes the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision maker problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon TT. Though this problem has attracted considerable attention in recent times, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, existing algorithms for this problem have regret bounded by O(κdT)O(\sqrt{\kappa d T}), where κ\kappa is a problem-dependent constant that can have an exponential dependency on the number of attributes. In this paper, we propose an optimistic algorithm and show that the regret is bounded by O(dT+κ)O(\sqrt{dT} + \kappa), significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favourable regret guarantee.Comment: updated version, under revie

    Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

    Full text link
    Key challenges in running a retail business include how to select products to present to consumers (the assortment problem), and how to price products (the pricing problem) to maximize revenue or profit. Instead of considering these problems in isolation, we propose a joint approach to assortment-pricing based on contextual bandits. Our model is doubly high-dimensional, in that both context vectors and actions are allowed to take values in high-dimensional spaces. In order to circumvent the curse of dimensionality, we propose a simple yet flexible model that captures the interactions between covariates and actions via a (near) low-rank representation matrix. The resulting class of models is reasonably expressive while remaining interpretable through latent factors, and includes various structured linear bandit and pricing models as particular cases. We propose a computationally tractable procedure that combines an exploration/exploitation protocol with an efficient low-rank matrix estimator, and we prove bounds on its regret. Simulation results show that this method has lower regret than state-of-the-art methods applied to various standard bandit and pricing models. Real-world case studies on the assortment-pricing problem, from an industry-leading instant noodles company to an emerging beauty start-up, underscore the gains achievable using our method. In each case, we show at least three-fold gains in revenue or profit by our bandit method, as well as the interpretability of the latent factor models that are learned

    Revenue Maximization and Learning in Products Ranking

    Full text link
    We consider the revenue maximization problem for an online retailer who plans to display a set of products differing in their prices and qualities and rank them in order. The consumers have random attention spans and view the products sequentially before purchasing a ``satisficing'' product or leaving the platform empty-handed when the attention span gets exhausted. Our framework extends the cascade model in two directions: the consumers have random attention spans instead of fixed ones and the firm maximizes revenues instead of clicking probabilities. We show a nested structure of the optimal product ranking as a function of the attention span when the attention span is fixed and design a 1/e1/e-approximation algorithm accordingly for the random attention spans. When the conditional purchase probabilities are not known and may depend on consumer and product features, we devise an online learning algorithm that achieves O~(T)\tilde{\mathcal{O}}(\sqrt{T}) regret relative to the approximation algorithm, despite of the censoring of information: the attention span of a customer who purchases an item is not observable. Numerical experiments demonstrate the outstanding performance of the approximation and online learning algorithms

    Robust Dynamic Assortment Optimization in the Presence of Outlier Customers

    Full text link
    We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the ε\varepsilon-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length TT, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a (1−ε1-\varepsilon)-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining ε\varepsilon-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in T when the assortment capacity is constant. Furthermore, we develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter ε\varepsilon. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.Comment: 27 pages, 1 figur
    • …
    corecore