196 research outputs found

    Optimal No-regret Learning in Repeated First-price Auctions

    Full text link
    We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms in that literature, mainly because contrary to the standard bandits setting, when a positive reward is obtained here, nothing about the environment can be learned. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves O(Tlog2T)O(\sqrt{T}\log^2 T) regret bound when the bidder's private values are stochastically generated. We do so by providing an algorithm on a general class of problems, which we call monotone group contextual bandits, where the same regret bound is established under stochastically generated contexts. Further, by a novel lower bound argument, we characterize an Ω(T2/3)\Omega(T^{2/3}) lower bound for the case where the contexts are adversarially generated, thus highlighting the impact of the contexts generation mechanism on the fundamental learning limit. Despite this, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an O(Tlog3T)O(\sqrt{T}\log^3 T) regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for this problem

    Development of Hypoxia Trapping Enhanced BB2R-Targeted Radiopharmaceutics for Prostate Cancer

    Get PDF
    The Gastrin-Releasing Peptide Receptor (BB2r) has been investigated as a diagnostic and therapeutic target for prostate and other cancers due to the high expression level on neoplastic relative to normal tissues. A variety of BB2r-targeted agents have been developed utilizing the bombesin(BBN) peptide, which has shown nanomolar binding affinity to human BB2r. However, as with most of the low-molecular weight, receptor-targeted drugs, a major challenge to clinical translation of BB2r-targeted agents is the low retention at the tumor site due to intrinsically high diffusion and efflux rates. Our laboratory seeks to address this deficiency by developing synthetic approaches to selectively increase retention of BB2r-targeted agents in prostate cancer. Hypoxic regions commonly exist in prostate tumors and many other cancers due to a chaotic vascular architecture which impedes delivery of oxygen. In this dissertation, we explore the incorporation of nitroimidazoles, a hypoxia-selective prodrug which irreversibly binds to intracellular nucleophiles in hypoxic tissues, into the BB2r-targeted agent paradigm. We seek to determine if these agents can increase the long-term retention in the tumor and thereby increase efficacy and clinical potential of BB2r-targeted agents. To that end, we have developed several generations of hypoxia trapping enhanced BBN analogs. Our first in vitro investigation of hypoxia-enhanced 111In-labeled BBN conjugates demonstrated significantly improved retention in hypoxic PC-3 human prostate cancer cells. However, it was determined that the proximity of the 2-nitroimidazole relative to the pharmacophore had a detrimental impact on BB2r binding affinity. To address the problem, our next generation of radioconjugates contained an extended linker to eliminate steric inhibition. The new design demonstrated substantially improved binding affinity and lower clearance rate of the 2-nitroimidazole containing radioconjugates under hypoxic conditions. In vivo biodistribution studies using a PC-3 xenograft mouse model revealed significant tumor retention enhancement. Further work is needed to clarify the mechanisms of cellular retention and to correlate the tumor hypoxia burden with the retention efficacy

    Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises

    Full text link
    Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i.e., the difference between the stochastic gradient and the true gradient is assumed to have a finite pp-th moment (say being upper bounded by σp\sigma^{p} for some σ0\sigma\geq0) where p(1,2]p\in(1,2], which not only generalizes the traditional finite variance assumption (p=2p=2) but also has been observed in practice for several different tasks. Under this challenging assumption, lots of new progress has been made for either convex or nonconvex problems, however, most of which only consider smooth objectives. In contrast, people have not fully explored and well understood this problem when functions are nonsmooth. This paper aims to fill this crucial gap by providing a comprehensive analysis of stochastic nonsmooth convex optimization with heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas, which is only proved to converge in expectation but under the additional strong convexity assumption. Under appropriate choices of parameters, for both convex and strongly convex functions, we not only establish the first high-probability rates but also give refined in-expectation bounds compared with existing works. Remarkably, all of our results are optimal (or nearly optimal up to logarithmic factors) with respect to the time horizon TT even when TT is unknown in advance. Additionally, we show how to make the algorithm parameter-free with respect to σ\sigma, in other words, the algorithm can still guarantee convergence without any prior knowledge of σ\sigma

    Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

    Full text link
    We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker, under a given maximum-number-of-batch constraint and only able to observe rewards at the end of each batch, can dynamically decide how many individuals to include in the next batch (at the end of the current batch) and what personalized action-selection scheme to adopt within each batch. Such batch constraints are ubiquitous in a variety of practical contexts, including personalized product offerings in marketing and medical treatment selection in clinical trials. We characterize the fundamental learning limit in this problem via a regret lower bound and provide a matching upper bound (up to log factors), thus prescribing an optimal scheme for this problem. To the best of our knowledge, our work provides the first inroad into a theoretical understanding of dynamic batch learning in high-dimensional sparse linear contextual bandits. Notably, even a special case of our result (when no batch constraint is present) yields the first minimax optimal O~(s0T)\tilde{O}(\sqrt{s_0T}) regret bound for standard online learning in high-dimensional linear contextual bandits (for the no-margin case), where s0s_0 is the sparsity parameter (or an upper bound thereof) and TT is the learning horizon. This result (both that O~(s0T)\tilde{O}(\sqrt{s_0 T}) is achievable and that Ω(s0T)\Omega(\sqrt{s_0 T}) is a lower bound) appears to be unknown in the emerging literature of high-dimensional contextual bandits.Comment: 33 page

    On the convergence of mirror descent beyond stochastic convex programming

    Get PDF
    In this paper, we examine the convergence of mirror descent in a class of stochastic optimization problems that are not necessarily convex (or even quasi-convex), and which we call variationally coherent. Since the standard technique of "ergodic averaging" offers no tangible benefits beyond convex programming, we focus directly on the algorithm's last generated sample (its "last iterate"), and we show that it converges with probabiility 11 if the underlying problem is coherent. We further consider a localized version of variational coherence which ensures local convergence of stochastic mirror descent (SMD) with high probability. These results contribute to the landscape of non-convex stochastic optimization by showing that (quasi-)convexity is not essential for convergence to a global minimum: rather, variational coherence, a much weaker requirement, suffices. Finally, building on the above, we reveal an interesting insight regarding the convergence speed of SMD: in problems with sharp minima (such as generic linear programs or concave minimization problems), SMD reaches a minimum point in a finite number of steps (a.s.), even in the presence of persistent gradient noise. This result is to be contrasted with existing black-box convergence rate estimates that are only asymptotic.Comment: 30 pages, 5 figure
    corecore