6,559 research outputs found

    Player-optimal Stable Regret for Bandit Learning in Matching Markets

    Full text link
    The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by O(KlogT/Δ2)O(K\log T/\Delta^2) where KK is the number of arms, TT is the horizon and Δ\Delta is the players' minimum preference gap among the first N+1N+1-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.Comment: SODA 202

    Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

    Full text link
    The linear bandit problem has been studied for many years in both stochastic and adversarial settings. Designing an algorithm that can optimize the environment without knowing the loss type attracts lots of interest. \citet{LeeLWZ021} propose an algorithm that actively detects the loss type and then switches between different algorithms specially designed for specific settings. However, such an approach requires meticulous designs to perform well in all environments. Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments. This algorithm is of simple design and the regret bounds are shown to be optimal in traditional multi-armed bandit problems compared with the detect-switch type. Designing an FTRL-type algorithm for linear bandits is an important question that has been open for a long time. In this paper, we prove that the FTRL algorithm with a negative entropy regularizer can achieve the best-of-three-world results for the linear bandit problem. Our regret bounds achieve the same or nearly the same order as the previous detect-switch type algorithm but with a much simpler algorithmic design.Comment: Accepted in COLT 202

    Structure and morphology of X-ray selected AGN hosts at 1<z<3 in CANDELS-COSMOS field

    Get PDF
    We analyze morphologies of the host galaxies of 35 X-ray selected active galactic nucleus (AGNs) at z2z\sim2 in the Cosmic Evolution Survey (COSMOS) field using Hubble Space Telescope/WFC3 imaging taken from the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS). We build a control sample of 350 galaxies in total, by selecting ten non-active galaxies drawn from the same field with the similar stellar mass and redshift for each AGN host. By performing two dimensional fitting with GALFIT on the surface brightness profile, we find that the distribution of Seˋ\`ersic index (n) of AGN hosts does not show a statistical difference from that of the control sample. We measure the nonparametric morphological parameters (the asymmetry index A, the Gini coefficient G, the concentration index C and the M20 index) based on point source subtracted images. All the distributions of these morphological parameters of AGN hosts are consistent with those of the control sample. We finally investigate the fraction of distorted morphologies in both samples by visual classification. Only \sim15% of the AGN hosts have highly distorted morphologies, possibly due to a major merger or interaction. We find there is no significant difference in the distortion fractions between the AGN host sample and control sample. We conclude that the morphologies of X-ray selected AGN hosts are similar to those of nonactive galaxies and most AGN activity is not triggered by major merger.Comment: 5 pages, 3 figures, accepted for publication in The Astrophysical Journal Letter

    Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

    Full text link
    Learning Markov decision processes (MDP) in an adversarial environment has been a challenging problem. The problem becomes even more challenging with function approximation, since the underlying structure of the loss function and transition kernel are especially hard to estimate in a varying environment. In fact, the state-of-the-art results for linear adversarial MDP achieve a regret of O~(K6/7)\tilde{O}(K^{6/7}) (KK denotes the number of episodes), which admits a large room for improvement. In this paper, we investigate the problem with a new view, which reduces linear MDP into linear optimization by subtly setting the feature maps of the bandit arms of linear optimization. This new technique, under an exploratory assumption, yields an improved bound of O~(K4/5)\tilde{O}(K^{4/5}) for linear adversarial MDP without access to a transition simulator. The new view could be of independent interest for solving other MDP problems that possess a linear structure

    Existence of positive solution for a third-order three-point BVP with sign-changing Green's function

    Get PDF
    By using the Guo-Krasnoselskii fixed point theorem, we investigate the following third-order three-point boundary value problem {u(t)=f(t,u(t)), t[0,1],u(0)=u(1)=0, u(η)+αu(0)=0, \left\{ \begin{array}{l} u'''(t)=f(t,u(t)),\ t\in [0,1], \\ u'(0)=u(1)=0,\ u''(\eta)+\alpha u(0)=0, \end{array} \right. where α[0,2)\alpha \in [0,2) and η[121+24α53(4+α),1)\eta\in[\frac{\sqrt{121+24\alpha}-5}{3(4+\alpha)},1). The emphasis is mainly that although the corresponding Green's function is sign-changing, the solution obtained is still positive

    Cost-Benefit Analysis of Phase Balancing Solution for Data-scarce LV Networks by Cluster-Wise Gaussian Process Regression

    Get PDF
    Phase imbalance widely exists in the UK’s low voltage (415V, LV) distribution networks. The imbalances not only lead to insufficient use of LV network assets but also cause energy losses. They lead to hundreds of millions of British pounds each year in the UK. The cost-benefit analyses of phase balancing solutions remained an unresolved question for the majority of the LV networks. The main challenge is data-scarcity – these networks only have peak current and total energy consumption that are collected once a year. To perform a cost-benefit analysis of phase balancing for data-scarce LV networks, this paper develops a customized cluster-wise Gaussian process regression (CGPR) approach. The approach estimates the total cost of phase imbalance for any data-scarce LV network by extracting knowledge from a set of representative data-rich LV networks and extrapolating the knowledge to any data-scarce network. The imbalance-induced cost is then translated into the benefit from phase balancing and this is compared against the costs of phase balancing solutions, e.g. deploying phase balancers. The developed CGPR approach assists distribution network operators (DNOs) to evaluate the cost-benefit of phase balancing solutions for data-scarce networks without the need to invest in additional monitoring devices
    corecore