6,559 research outputs found
Player-optimal Stable Regret for Bandit Learning in Matching Markets
The problem of matching markets has been studied for a long time in the
literature due to its wide range of applications. Finding a stable matching is
a common equilibrium objective in this problem. Since market participants are
usually uncertain of their preferences, a rich line of recent works study the
online setting where one-side participants (players) learn their unknown
preferences from iterative interactions with the other side (arms). Most
previous works in this line are only able to derive theoretical guarantees for
player-pessimal stable regret, which is defined compared with the players'
least-preferred stable matching. However, under the pessimal stable matching,
players only obtain the least reward among all stable matchings. To maximize
players' profits, player-optimal stable matching would be the most desirable.
Though \citet{basu21beyond} successfully bring an upper bound for
player-optimal stable regret, their result can be exponentially large if
players' preference gap is small. Whether a polynomial guarantee for this
regret exists is a significant but still open problem. In this work, we provide
a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the
optimal stable regret of each player can be upper bounded by where is the number of arms, is the horizon and
is the players' minimum preference gap among the first -ranked arms. This
result significantly improves previous works which either have a weaker
player-pessimal stable matching objective or apply only to markets with special
assumptions. When the preferences of participants satisfy some special
conditions, our regret upper bound also matches the previously derived lower
bound.Comment: SODA 202
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm
The linear bandit problem has been studied for many years in both stochastic
and adversarial settings. Designing an algorithm that can optimize the
environment without knowing the loss type attracts lots of interest.
\citet{LeeLWZ021} propose an algorithm that actively detects the loss type and
then switches between different algorithms specially designed for specific
settings. However, such an approach requires meticulous designs to perform well
in all environments. Follow-the-regularized-leader (FTRL) is another type of
popular algorithm that can adapt to different environments. This algorithm is
of simple design and the regret bounds are shown to be optimal in traditional
multi-armed bandit problems compared with the detect-switch type. Designing an
FTRL-type algorithm for linear bandits is an important question that has been
open for a long time. In this paper, we prove that the FTRL algorithm with a
negative entropy regularizer can achieve the best-of-three-world results for
the linear bandit problem. Our regret bounds achieve the same or nearly the
same order as the previous detect-switch type algorithm but with a much simpler
algorithmic design.Comment: Accepted in COLT 202
Structure and morphology of X-ray selected AGN hosts at 1<z<3 in CANDELS-COSMOS field
We analyze morphologies of the host galaxies of 35 X-ray selected active
galactic nucleus (AGNs) at in the Cosmic Evolution Survey (COSMOS)
field using Hubble Space Telescope/WFC3 imaging taken from the Cosmic Assembly
Near-infrared Deep Extragalactic Legacy Survey (CANDELS). We build a control
sample of 350 galaxies in total, by selecting ten non-active galaxies drawn
from the same field with the similar stellar mass and redshift for each AGN
host. By performing two dimensional fitting with GALFIT on the surface
brightness profile, we find that the distribution of Srsic index (n) of
AGN hosts does not show a statistical difference from that of the control
sample. We measure the nonparametric morphological parameters (the asymmetry
index A, the Gini coefficient G, the concentration index C and the M20 index)
based on point source subtracted images. All the distributions of these
morphological parameters of AGN hosts are consistent with those of the control
sample. We finally investigate the fraction of distorted morphologies in both
samples by visual classification. Only 15% of the AGN hosts have highly
distorted morphologies, possibly due to a major merger or interaction. We find
there is no significant difference in the distortion fractions between the AGN
host sample and control sample. We conclude that the morphologies of X-ray
selected AGN hosts are similar to those of nonactive galaxies and most AGN
activity is not triggered by major merger.Comment: 5 pages, 3 figures, accepted for publication in The Astrophysical
Journal Letter
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Learning Markov decision processes (MDP) in an adversarial environment has
been a challenging problem. The problem becomes even more challenging with
function approximation, since the underlying structure of the loss function and
transition kernel are especially hard to estimate in a varying environment. In
fact, the state-of-the-art results for linear adversarial MDP achieve a regret
of ( denotes the number of episodes), which admits a
large room for improvement. In this paper, we investigate the problem with a
new view, which reduces linear MDP into linear optimization by subtly setting
the feature maps of the bandit arms of linear optimization. This new technique,
under an exploratory assumption, yields an improved bound of
for linear adversarial MDP without access to a transition
simulator. The new view could be of independent interest for solving other MDP
problems that possess a linear structure
Existence of positive solution for a third-order three-point BVP with sign-changing Green's function
By using the Guo-Krasnoselskii fixed point theorem, we investigate the following third-order three-point boundary value problem
where and . The emphasis is mainly that although the corresponding Green's function is sign-changing, the solution obtained is still positive
Cost-Benefit Analysis of Phase Balancing Solution for Data-scarce LV Networks by Cluster-Wise Gaussian Process Regression
Phase imbalance widely exists in the UK’s low voltage (415V, LV) distribution networks. The imbalances not only lead to insufficient use of LV network assets but also cause energy losses. They lead to hundreds of millions of British pounds each year in the UK. The cost-benefit analyses of phase balancing solutions remained an unresolved question for the majority of the LV networks. The main challenge is data-scarcity – these networks only have peak current and total energy consumption that are collected once a year. To perform a cost-benefit analysis of phase balancing for data-scarce LV networks, this paper develops a customized cluster-wise Gaussian process regression (CGPR) approach. The approach estimates the total cost of phase imbalance for any data-scarce LV network by extracting knowledge from a set of representative data-rich LV networks and extrapolating the knowledge to any data-scarce network. The imbalance-induced cost is then translated into the benefit from phase balancing and this is compared against the costs of phase balancing solutions, e.g. deploying phase balancers. The developed CGPR approach assists distribution network operators (DNOs) to evaluate the cost-benefit of phase balancing solutions for data-scarce networks without the need to invest in additional monitoring devices
- …