70 research outputs found

    Allocating Divisible Resources on Arms with Unknown and Random Rewards

    Full text link
    We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order bb of the allocated resource. In particular, if the decision maker allocates resource AiA_i to arm ii in a period, then the reward YiY_i isYi(Ai)=Aiμi+AibξiY_i(A_i)=A_i \mu_i+A_i^b \xi_{i}, where μi\mu_i is the unknown mean and the noise ξi\xi_{i} is independent and sub-Gaussian. When the order bb ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for b[0,1]b\in [0,1], and demonstrate a phase transition at b=1/2b=1/2. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic

    Competition among Parallel Contests

    Full text link
    We investigate the model of multiple contests held in parallel, where each contestant selects one contest to join and each contest designer decides the prize structure to compete for the participation of contestants. We first analyze the strategic behaviors of contestants and completely characterize the symmetric Bayesian Nash equilibrium. As for the strategies of contest designers, when other designers' strategies are known, we show that computing the best response is NP-hard and propose a fully polynomial time approximation scheme (FPTAS) to output the ϵ\epsilon-approximate best response. When other designers' strategies are unknown, we provide a worst case analysis on one designer's strategy. We give an upper bound on the utility of any strategy and propose a method to construct a strategy whose utility can guarantee a constant ratio of this upper bound in the worst case.Comment: Accepted by the 18th Conference on Web and Internet Economics (WINE 2022

    Equilibrium Analysis of Customer Attraction Games

    Full text link
    We introduce a game model called "customer attraction game" to demonstrate the competition among online content providers. In this model, customers exhibit interest in various topics. Each content provider selects one topic and benefits from the attracted customers. We investigate both symmetric and asymmetric settings involving agents and customers. In the symmetric setting, the existence of pure Nash equilibrium (PNE) is guaranteed, but finding a PNE is PLS-complete. To address this, we propose a fully polynomial time approximation scheme to identify an approximate PNE. Moreover, the tight Price of Anarchy (PoA) is established. In the asymmetric setting, we show the nonexistence of PNE in certain instances and establish that determining its existence is NP-hard. Nevertheless, we prove the existence of an approximate PNE. Additionally, when agents select topics sequentially, we demonstrate that finding a subgame-perfect equilibrium is PSPACE-hard. Furthermore, we present the sequential PoA for the two-agent setting

    Revenue Maximization and Learning in Products Ranking

    Full text link
    We consider the revenue maximization problem for an online retailer who plans to display a set of products differing in their prices and qualities and rank them in order. The consumers have random attention spans and view the products sequentially before purchasing a ``satisficing'' product or leaving the platform empty-handed when the attention span gets exhausted. Our framework extends the cascade model in two directions: the consumers have random attention spans instead of fixed ones and the firm maximizes revenues instead of clicking probabilities. We show a nested structure of the optimal product ranking as a function of the attention span when the attention span is fixed and design a 1/e1/e-approximation algorithm accordingly for the random attention spans. When the conditional purchase probabilities are not known and may depend on consumer and product features, we devise an online learning algorithm that achieves O~(T)\tilde{\mathcal{O}}(\sqrt{T}) regret relative to the approximation algorithm, despite of the censoring of information: the attention span of a customer who purchases an item is not observable. Numerical experiments demonstrate the outstanding performance of the approximation and online learning algorithms

    Algorithmic Decision-Making Safeguarded by Human Knowledge

    Full text link
    Commercial AI solutions provide analysts and managers with data-driven business intelligence for a wide range of decisions, such as demand forecasting and pricing. However, human analysts may have their own insights and experiences about the decision-making that is at odds with the algorithmic recommendation. In view of such a conflict, we provide a general analytical framework to study the augmentation of algorithmic decisions with human knowledge: the analyst uses the knowledge to set a guardrail by which the algorithmic decision is clipped if the algorithmic output is out of bound, and seems unreasonable. We study the conditions under which the augmentation is beneficial relative to the raw algorithmic decision. We show that when the algorithmic decision is asymptotically optimal with large data, the non-data-driven human guardrail usually provides no benefit. However, we point out three common pitfalls of the algorithmic decision: (1) lack of domain knowledge, such as the market competition, (2) model misspecification, and (3) data contamination. In these cases, even with sufficient data, the augmentation from human knowledge can still improve the performance of the algorithmic decision

    On the complexity of computing Markov perfect equilibrium in general-sum stochastic games

    Get PDF
    Similar to the role of Markov decision processes in reinforcement learning, Markov games (also called stochastic games) lay down the foundation for the study of multi-agent reinforcement learning and sequential agent interactions. We introduce approximate Markov perfect equilibrium as a solution to the computational problem of finite-state stochastic games repeated in the infinite horizon and prove its PPAD-completeness. This solution concept preserves the Markov perfect property and opens up the possibility for the success of multi-agent reinforcement learning algorithms on static two-player games to be extended to multi-agent dynamic games, expanding the reign of the PPAD-complete class

    Deep Learning is Provably Robust to Symmetric Label Noise

    Full text link
    Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mitigation? We provide an affirmative answer for the case of symmetric label noise: We find that certain DNNs, including under-parameterized and over-parameterized models, can tolerate massive symmetric label noise up to the information-theoretic threshold. By appealing to classical statistical theory and universal consistency of DNNs, we prove that for multiclass classification, L1L_1-consistent DNN classifiers trained under symmetric label noise can achieve Bayes optimality asymptotically if the label noise probability is less than K1K\frac{K-1}{K}, where K2K \ge 2 is the number of classes. Our results show that for symmetric label noise, no mitigation is necessary for L1L_1-consistent estimators. We conjecture that for general label noise, mitigation strategies that make use of the noisy data will outperform those that ignore the noisy data

    Biliary drainage in malignant biliary obstruction: an umbrella review of randomized controlled trials

    Get PDF
    BackgroundThere are still many controversies about biliary drainage in MBO, and we aimed to summarize and evaluate the evidence associated with biliary drainage.MethodsWe conducted an umbrella review of SRoMAs based on RCTs. Through July 28, 2022, Embase, PubMed, WOS, and Cochrane Database were searched. Two reviewers independently screened the studies, extracted the data, and appraised the methodological quality of the included studies. GRADE was used to evaluate the quality of the evidence.Results36 SRoMAs were identified. After excluding 24 overlapping studies, 12 SRoMAs, including 76 RCTs, and 124 clinical outcomes for biliary drainage in MBO were included. Of the 124 pieces of evidence evaluated, 13 were rated “High” quality, 38 were rated “Moderate”, and the rest were rated “Low” or “Very low”. For patients with MBO, 125I seeds+stent can reduce the risk of stent occlusion, RFA+stent can improve the prognosis; compared with PC, SEMS can increase the risk of tumor ingrowth and reduce the occurrence of sludge formation, and the incidence of tumor ingrowth in C-SEMS/PC-SEMS was significantly lower than that in U-SEMS. There was no difference in the success rate of drainage between EUS-BD and ERCP-BD, but the use of EUS-BD can reduce the incidence of stent dysfunction. For patients with obstructive jaundice, PBD does not affect postoperative mortality compared to direct surgery. The use of MS in patients with periampullary cancer during PBD can reduce the risk of re-intervention and stent occlusion compared to PC. In addition, we included four RCTs that showed that when performing EUS-BD on MBO, hepaticogastrostomy has higher technical success rates than choledochoduodenostomy. Patients who received Bilateral-ENBD had a lower additional drainage rate than those who received Unilateral-ENBD.ConclusionsOur study summarizes a large amount of evidence related to biliary drainage, which helps to reduce the uncertainty in the selection of biliary drainage strategies for MBO patients under different circumstances
    corecore