120 research outputs found

    A unifying view of optimism in episodic reinforcement learning

    Get PDF
    The principle of “optimism in the face of uncertainty” underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs anoptimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address large-scale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods

    Exact algorithms for the 0–1 Time-Bomb Knapsack Problem

    Get PDF
    We consider a stochastic version of the 0–1 Knapsack Problem in which, in addition to profit and weight, each item is associated with a probability of exploding and destroying all the contents of the knapsack. The objective is to maximise the expected profit of the selected items. The resulting problem, denoted as 0–1 Time-Bomb Knapsack Problem (01-TB-KP), has applications in logistics and cloud computing scheduling. We introduce a nonlinear mathematical formulation of the problem, study its computational complexity, and propose techniques to derive upper and lower bounds using convex optimisation and integer linear programming. We present three exact approaches based on enumeration, branch and bound, and dynamic programming, and computationally evaluate their performance on a large set of benchmark instances. The computational analysis shows that the proposed methods outperform the direct application of nonlinear solvers on the mathematical model, and provide high quality solutions in a limited amount of time

    Bandit problems with fidelity rewards

    Get PDF
    The fidelity bandits problem is a variant of the K-armed bandit problem in which the reward of each arm is augmented by a fidelity reward that provides the player with an additional payoff depending on how ‘loyal’ the player has been to that arm in the past. We propose two models for fidelity. In the loyalty-points model the amount of extra reward depends on the number of times the arm has previously been played. In the subscription model the additional reward depends on the current number of consecutive draws of the arm. We consider both stochastic and adversarial problems. Since single-arm strategies are not always optimal in stochastic problems, the notion of regret in the adversarial setting needs careful adjustment. We introduce three possible notions of regret and investigate which can be bounded sublinearly. We study in detail the special cases of increasing, decreasing and coupon (where the player gets an additional reward after every m plays of an arm) fidelity rewards. For the models which do not necessarily enjoy sublinear regret, we provide a worst case lower bound. For those models which exhibit sublinear regret, we provide algorithms and bound their regret

    Delayed feedback in kernel bandits

    Get PDF
    Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with O~(Γk(T)T−−−−−−√+E[τ]) regret, where T is the number of time steps, Γk(T) is the maximum information gain of the kernel with T observations, and τ is the delay random variable. This represents a significant improvement over the state of the art regret bound of O~(Γk(T)T−−√+E[τ]Γk(T)) reported in (Verma et al., 2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations

    Trading-off payments and accuracy in online classification with paid stochastic experts

    Get PDF
    We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz “productivity” function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after T rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most O(K2(lnT)T−−√) where K is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order T2/3 one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data

    Alteration of EGFR Spatiotemporal Dynamics Suppresses Signal Transduction

    Get PDF
    The epidermal growth factor receptor (EGFR), which regulates cell growth and survival, is integral to colon tumorigenesis. Lipid rafts play a role in regulating EGFR signaling, and docosahexaenoic acid (DHA) is known to perturb membrane domain organization through changes in lipid rafts. Therefore, we investigated the mechanistic link between EGFR function and DHA. Membrane incorporation of DHA into immortalized colonocytes altered the lateral organization of EGFR. DHA additionally increased EGFR phosphorylation but paradoxically suppressed downstream signaling. Assessment of the EGFR-Ras-ERK1/2 signaling cascade identified Ras GTP binding as the locus of the DHA-induced disruption of signal transduction. DHA also antagonized EGFR signaling capacity by increasing receptor internalization and degradation. DHA suppressed cell proliferation in an EGFR-dependent manner, but cell proliferation could be partially rescued by expression of constitutively active Ras. Feeding chronically-inflamed, carcinogen-injected C57BL/6 mice a fish oil containing diet enriched in DHA recapitulated the effects on the EGFR signaling axis observed in cell culture and additionally suppressed tumor formation. We conclude that DHA-induced alteration in both the lateral and subcellular localization of EGFR culminates in the suppression of EGFR downstream signal transduction, which has implications for the molecular basis of colon cancer prevention by DHA

    Logging Affects Fledgling Sex Ratios and Baseline Corticosterone in a Forest Songbird

    Get PDF
    Silviculture (logging) creates a disturbance to forested environments. The degree to which forests are modified depends on the logging prescription and forest stand characteristics. In this study we compared the effects of two methods of group-selection (“moderate” and “heavy”) silviculture (GSS) and undisturbed reference stands on stress and offspring sex ratios of a forest interior species, the Ovenbird (Seiurus aurocapilla), in Algonquin Provincial Park, Canada. Blood samples were taken from nestlings for corticosterone and molecular sexing. We found that logging creates a disturbance that is stressful for nestling Ovenbirds, as illustrated by elevated baseline corticosterone in cut sites. Ovenbirds nesting in undisturbed reference forest produce fewer male offspring per brood (proportion male = 30%) while logging with progressively greater forest disturbance, shifted the offspring sex ratio towards males (proportion male: moderate = 50%, heavy = 70%). If Ovenbirds in undisturbed forests usually produce female-biased broods, then the production of males as a result of logging may disrupt population viability. We recommend a broad examination of nestling sex ratios in response to anthropogenic disturbance to determine the generality of our findings

    The modulating effect of education on semantic interference during healthy aging

    Get PDF
    Aging has traditionally been related to impairments in name retrieval. These impairments have usually been explained by a phonological transmission deficit hypothesis or by an inhibitory deficit hypothesis. This decline can, however, be modulated by the educational level of the sample. This study analyzed the possible role of these approaches in explaining both object and face naming impairments during aging. Older adults with low and high educational level and young adults with high educational level were asked to repeatedly name objects or famous people using the semantic-blocking paradigm. We compared naming when exemplars were presented in a semantically homogeneous or in a semantically heterogeneous context. Results revealed significantly slower rates of both face and object naming in the homogeneous context (i.e., semantic interference), with a stronger effect for face naming. Interestingly, the group of older adults with a lower educational level showed an increased semantic interference effect during face naming. These findings suggest the joint work of the two mechanisms proposed to explain age-related naming difficulties, i.e., the inhibitory deficit and the transmission deficit hypothesis. Therefore, the stronger vulnerability to semantic interference in the lower educated older adult sample would possibly point to a failure in the inhibitory mechanisms in charge of interference resolution, as proposed by the inhibitory deficit hypothesis. In addition, the fact that this interference effect was mainly restricted to face naming and not to object naming would be consistent with the increased age-related difficulties during proper name retrieval, as suggested by the transmission deficit hypothesis.This research was supported by grants PSI2013-46033-P to A.M., PSI2015-65502-C2-1-P to M.T.B., PCIN-2015-165-C02-01 to D.P., PSI2017-89324-C2-1-P to DP from the Spanish Ministry of Economy and Competitiveness (http://www.mineco.gob.es/)

    LSST: from Science Drivers to Reference Design and Anticipated Data Products

    Get PDF
    (Abridged) We describe here the most ambitious survey currently planned in the optical, the Large Synoptic Survey Telescope (LSST). A vast array of science will be enabled by a single wide-deep-fast sky survey, and LSST will have unique survey capability in the faint time domain. The LSST design is driven by four main science themes: probing dark energy and dark matter, taking an inventory of the Solar System, exploring the transient optical sky, and mapping the Milky Way. LSST will be a wide-field ground-based system sited at Cerro Pach\'{o}n in northern Chile. The telescope will have an 8.4 m (6.5 m effective) primary mirror, a 9.6 deg2^2 field of view, and a 3.2 Gigapixel camera. The standard observing sequence will consist of pairs of 15-second exposures in a given field, with two such visits in each pointing in a given night. With these repeats, the LSST system is capable of imaging about 10,000 square degrees of sky in a single filter in three nights. The typical 5σ\sigma point-source depth in a single visit in rr will be ∌24.5\sim 24.5 (AB). The project is in the construction phase and will begin regular survey operations by 2022. The survey area will be contained within 30,000 deg2^2 with ÎŽ<+34.5∘\delta<+34.5^\circ, and will be imaged multiple times in six bands, ugrizyugrizy, covering the wavelength range 320--1050 nm. About 90\% of the observing time will be devoted to a deep-wide-fast survey mode which will uniformly observe a 18,000 deg2^2 region about 800 times (summed over all six bands) during the anticipated 10 years of operations, and yield a coadded map to r∌27.5r\sim27.5. The remaining 10\% of the observing time will be allocated to projects such as a Very Deep and Fast time domain survey. The goal is to make LSST data products, including a relational database of about 32 trillion observations of 40 billion objects, available to the public and scientists around the world.Comment: 57 pages, 32 color figures, version with high-resolution figures available from https://www.lsst.org/overvie

    Reggie-1/flotillin-2 promotes secretion of the long-range signalling forms of Wingless and Hedgehog in Drosophila

    Get PDF
    The lipid-modified morphogens Wnt and Hedgehog diffuse poorly in isolation yet can spread over long distances in vivo, predicting existence of two distinct forms of these mophogens. The first is poorly mobile and activates short-range target genes. The second is specifically packed for efficient spreading to induce long-range targets. Subcellular mechanisms involved in the discriminative secretion of these two forms remain elusive. Wnt and Hedgehog can associate with membrane microdomains, but the function of this association was unknown. Here we show that a major protein component of membrane microdomains, reggie-1/flotillin-2, plays important roles in secretion and spreading of Wnt and Hedgehog in Drosophila. Reggie-1 loss-of-function results in reduced spreading of the morphogens, while its overexpression stimulates secretion of Wnt and Hedgehog and expands their diffusion. The resulting changes in the morphogen gradients differently affect the short- and long-range targets. In its action reggie-1 appears specific for Wnt and Hedgehog. These data suggest that reggie-1 is an important component of the Wnt and Hedgehog secretion pathway dedicated to formation of the mobile pool of these morphogens
    • 

    corecore