59 research outputs found

    Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

    Full text link
    Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space

    Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning

    Full text link
    Fairness plays a crucial role in various multi-agent systems (e.g., communication networks, financial markets, etc.). Many multi-agent dynamical interactions can be cast as Markov Decision Processes (MDPs). While existing research has focused on studying fairness in known environments, the exploration of fairness in such systems for unknown environments remains open. In this paper, we propose a Reinforcement Learning (RL) approach to achieve fairness in multi-agent finite-horizon episodic MDPs. Instead of maximizing the sum of individual agents' value functions, we introduce a fairness function that ensures equitable rewards across agents. Since the classical Bellman's equation does not hold when the sum of individual value functions is not maximized, we cannot use traditional approaches. Instead, in order to explore, we maintain a confidence bound of the unknown environment and then propose an online convex optimization based approach to obtain a policy constrained to this confidence region. We show that such an approach achieves sub-linear regret in terms of the number of episodes. Additionally, we provide a probably approximately correct (PAC) guarantee based on the obtained regret bound. We also propose an offline RL algorithm and bound the optimality gap with respect to the optimal fair solution. To mitigate computational complexity, we introduce a policy-gradient type method for the fair objective. Simulation experiments also demonstrate the efficacy of our approach

    Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning

    Full text link
    Meta-learning has arisen as a successful method for improving training performance by training over many similar tasks, especially with deep neural networks (DNNs). However, the theoretical understanding of when and why overparameterized models such as DNNs can generalize well in meta-learning is still limited. As an initial step towards addressing this challenge, this paper studies the generalization performance of overfitted meta-learning under a linear regression model with Gaussian features. In contrast to a few recent studies along the same line, our framework allows the number of model parameters to be arbitrarily larger than the number of features in the ground truth signal, and hence naturally captures the overparameterized regime in practical deep meta-learning. We show that the overfitted min â„“2\ell_2-norm solution of model-agnostic meta-learning (MAML) can be beneficial, which is similar to the recent remarkable findings on ``benign overfitting'' and ``double descent'' phenomenon in the classical (single-task) linear regression. However, due to the uniqueness of meta-learning such as task-specific gradient descent inner training and the diversity/fluctuation of the ground-truth signals among training tasks, we find new and interesting properties that do not exist in single-task linear regression. We first provide a high-probability upper bound (under reasonable tightness) on the generalization error, where certain terms decrease when the number of features increases. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large. Under this circumstance, we show that the overfitted min â„“2\ell_2-norm solution can achieve an even lower generalization error than the underparameterized solution

    Generalization Performance of Transfer Learning: Overparameterized and Underparameterized Regimes

    Full text link
    Transfer learning is a useful technique for achieving improved performance and reducing training costs by leveraging the knowledge gained from source tasks and applying it to target tasks. Assessing the effectiveness of transfer learning relies on understanding the similarity between the ground truth of the source and target tasks. In real-world applications, tasks often exhibit partial similarity, where certain aspects are similar while others are different or irrelevant. To investigate the impact of partial similarity on transfer learning performance, we focus on a linear regression model with two distinct sets of features: a common part shared across tasks and a task-specific part. Our study explores various types of transfer learning, encompassing two options for parameter transfer. By establishing a theoretical characterization on the error of the learned model, we compare these transfer learning options, particularly examining how generalization performance changes with the number of features/parameters in both underparameterized and overparameterized regimes. Furthermore, we provide practical guidelines for determining the number of features in the common and task-specific parts for improved generalization performance. For example, when the total number of features in the source task's learning model is fixed, we show that it is more advantageous to allocate a greater number of redundant features to the task-specific part rather than the common part. Moreover, in specific scenarios, particularly those characterized by high noise levels and small true parameters, sacrificing certain true features in the common part in favor of employing more redundant features in the task-specific part can yield notable benefits

    The 2019 eruption of recurrent nova V3890 Sgr: Observations by Swift, NICER, and SMARTS

    Get PDF
    V3890 Sgr is a recurrent nova that has been seen in outburst three times so far, with the most recent eruption occurring on 2019 August 27 ut. This latest outburst was followed in detail by the Neil Gehrels Swift Observatory, from less than a day after the eruption until the nova entered the Sun observing constraint, with a small number of additional observations after the constraint ended. The X-ray light curve shows initial hard shock emission, followed by an early start of the supersoft source phase around day 8.5, with the soft emission ceasing by day 26. Together with the peak blackbody temperature of the supersoft spectrum being ∼100 eV, these timings suggest the white dwarf mass to be high, ∼ 1.3, M·. The UV photometric light curve decays monotonically, with the decay rate changing a number of times, approximately simultaneously with variations in the X-ray emission. The UV grism spectra show both line and continuum emission, with emission lines of N, C, Mg, and O being notable. These UV spectra are best dereddened using a Small Magellanic Cloud extinction law. Optical spectra from SMARTS show evidence of interaction between the nova ejecta and wind from the donor star, as well as the extended atmosphere of the red giant being flash-ionized by the supersoft X-ray photons. Data from NICER reveal a transient 83 s quasi-periodic oscillation, with a modulation amplitude of 5 per cent, adding to the sample of novae that show such short variabilities during their supersoft phase

    A remarkable recurrent nova in M 31: The predicted 2014 outburst in X-rays with Swift

    Get PDF
    The M 31 nova M31N 2008-12a was recently found to be a recurrent nova (RN) with a recurrence time of about 1 year. This is by far the fastest recurrence time scale of any known RNe. Our optical monitoring programme detected the predicted 2014 outburst of M31N 2008-12a in early October. We immediately initiated an X-ray/UV monitoring campaign with Swift to study the multiwavelength evolution of the outburst. We monitored M31N 2008-12a with daily Swift observations for 20 days after discovery, covering the entire supersoft X-ray source (SSS) phase. We detected SSS emission around day six after outburst. The SSS state lasted for approximately two weeks until about day 19. M31N 2008-12a was a bright X-ray source with a high blackbody temperature. The X-ray properties of this outburst were very similar to the 2013 eruption. Combined X-ray spectra show a fast rise and decline of the effective blackbody temperature. The short-term X-ray light curve showed strong, aperiodic variability which decreased significantly after about day 14. Overall, the X-ray properties of M31N 2008-12a are consistent with the average population properties of M 31 novae. The optical and X-ray light curves can be scaled uniformly to show similar time scales as those of the Galactic RNe U Sco or RS Oph. The SSS evolution time scales and effective temperatures are consistent with a high-mass WD. We predict the next outburst of M31N 2008-12a to occur in autumn 2015

    Two uniquely arranged thyroid hormone response elements in the far upstream 5′ flanking region confer direct thyroid hormone regulation to the murine cholesterol 7α hydroxylase gene

    Get PDF
    Cholesterol 7α hydroxlyase (CYP7A1) is a key enzyme in cholesterol catabolism to bile acids and its activity is important for maintaining appropriate cholesterol levels. The murine CYP7A1 gene is highly inducible by thyroid hormone in vivo and there is an inverse relationship between thyroid hormone and serum cholesterol. Eventhough gene expression has been shown to be upregulated, whether the induction was mediated through a direct effect of thyroid hormone on the CYP7A1 promoter has never been established. Using gene targeted mice, we show that either of the two TR isoforms are sufficient to maintain normal hepatic CYP7A1 expression but a loss of both results in a significant decrease in expression. We also identified two new functional thyroid hormone receptor-binding sites in the CYP7A1 5′ flanking sequence located 3 kb upstream from the transcription start site. One site is a DR-0, which is an unusual type of TR response element, and the other consists of only a single recognizable half site that is required for TR/retinoid X receptor (RXR) binding. These two independent TR-binding sites are closely spaced and both are required for full induction of the CYP7A1 promoter by thyroid hormone, although the DR-0 site was more crucial

    X-Ray Spectroscopy of Stars

    Full text link
    (abridged) Non-degenerate stars of essentially all spectral classes are soft X-ray sources. Low-mass stars on the cooler part of the main sequence and their pre-main sequence predecessors define the dominant stellar population in the galaxy by number. Their X-ray spectra are reminiscent, in the broadest sense, of X-ray spectra from the solar corona. X-ray emission from cool stars is indeed ascribed to magnetically trapped hot gas analogous to the solar coronal plasma. Coronal structure, its thermal stratification and geometric extent can be interpreted based on various spectral diagnostics. New features have been identified in pre-main sequence stars; some of these may be related to accretion shocks on the stellar surface, fluorescence on circumstellar disks due to X-ray irradiation, or shock heating in stellar outflows. Massive, hot stars clearly dominate the interaction with the galactic interstellar medium: they are the main sources of ionizing radiation, mechanical energy and chemical enrichment in galaxies. High-energy emission permits to probe some of the most important processes at work in these stars, and put constraints on their most peculiar feature: the stellar wind. Here, we review recent advances in our understanding of cool and hot stars through the study of X-ray spectra, in particular high-resolution spectra now available from XMM-Newton and Chandra. We address issues related to coronal structure, flares, the composition of coronal plasma, X-ray production in accretion streams and outflows, X-rays from single OB-type stars, massive binaries, magnetic hot objects and evolved WR stars.Comment: accepted for Astron. Astrophys. Rev., 98 journal pages, 30 figures (partly multiple); some corrections made after proof stag

    Nova LMC 2009a as observed with XMM-Newton, compared with other novae

    Get PDF
    We examine four high-resolution reflection grating spectrometers (RGS) spectra of the February 2009 outburst of the luminous recurrent nova LMC 2009a. They were very complex and rich in intricate absorption and emission features. The continuum was consistent with a dominant component originating in the atmosphere of a shell burning white dwarf (WD) with peak effective temperature between 810 000 K and a million K, and mass in the 1.2-1.4 M⊙range. A moderate blue shift of the absorption features of a few hundred km s-1can be explained with a residual nova wind depleting the WD surface at a rate of about 10-8M⊙yr-1. The emission spectrum seems to be due to both photoionization and shock ionization in the ejecta. The supersoft X-ray flux was irregularly variable on time-scales of hours, with decreasing amplitude of the variability. We find that both the period and the amplitude of another, already known 33.3-s modulation varied within time-scales of hours. We compared N LMC 2009a with other Magellanic Clouds novae, including four serendipitously discovered as supersoft X-ray sources (SSS) among 13 observed within 16 yr after the eruption. The new detected targets were much less luminous than expected: we suggest that they were partially obscured by the accretion disc. Lack of SSS detections in theMagellanic Clouds novae more than 5.5 yr after the eruption constrains the average duration of the nuclear burning phase
    • …
    corecore