20 research outputs found

    Regret Bounds for Learning State Representations in Reinforcement Learning

    Get PDF
    International audienceWe consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(√ T) regret in any communicating MDP. The regret bound shows that UCB-MS automatically adapts to the Markov model and improves over the currently known best bound of order O(T 2/3)

    Regret Bounds for Learning State Representations in Reinforcement Learning

    Get PDF
    International audienceWe consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(√ T) regret in any communicating MDP. The regret bound shows that UCB-MS automatically adapts to the Markov model and improves over the currently known best bound of order O(T 2/3)

    Senegal: Presidential elections 2019 - The shining example of democratic transition immersed in muddy power-politics

    Get PDF
    Whereas Senegal has long been sold as a showcase of democracy in Africa, including peaceful political alternance, things apparently changed fundamentally with the Senegalese presidentials of 2019 that brought new configurations. One of the major issues was political transhumance that has been elevated to the rank of religion in defiance of morality. It threatened political stability and peace. In response, social networks of predominantly young activists, created in 2011 in the aftermath of the Arab Spring focused on grass-roots advocacy with the electorate on good governance and democracy. They proposed a break with a political system that they consider as neo-colonialist. Moreover, Senegal’s justice is frequently accused to be biased, and the servility of the Constitutional Council which is in the first place an electoral court has often been denounced

    Regret Bounds for Learning State Representations in Reinforcement Learning

    No full text
    International audienceWe consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(√ T) regret in any communicating MDP. The regret bound shows that UCB-MS automatically adapts to the Markov model and improves over the currently known best bound of order O(T 2/3)

    A monadic framework for relational verification: applied to information security, program equivalence, and optimizations

    No full text
    International audienceRelational properties describe multiple runs of one or more programs. They characterize many useful notions of security, program refinement, and equivalence for programs with diverse computational effects, and they have received much attention in the recent literature. Rather than developing separate tools for special classes of effects and relational properties, we advocate using a general purpose proof assistant as a unifying framework for the relational verification of effectful programs. The essence of our approach is to model effectful computations using monads and to prove relational properties on their monadic representations, making the most of existing support for reasoning about pure programs.We apply this method in F* and evaluate it by encoding a variety of relational program analyses, including information flow control, program equivalence and refinement at higher order, correctness of program optimizations and game-based cryptographic security. By relying on SMT-based automation, unary weakest preconditions, user-defined effects, and monadic reification, we show that, compared to unary properties, verifying relational properties requires little additional effort from the F* programmer

    The science of EChO

    Get PDF
    The science of extra-solar planets is one of the most rapidly changing areas of astrophysics and since 1995 the number of planets known has increased by almost two orders of magnitude. A combination of ground-based surveys and dedicated space missions has resulted in 560-plus planets being detected, and over 1200 that await confirmation. NASA's Kepler mission has opened up the possibility of discovering Earth-like planets in the habitable zone around some of the 100,000 stars it is surveying during its 3 to 4-year lifetime. The new ESA's Gaia mission is expected to discover thousands of new planets around stars within 200 parsecs of the Sun. The key challenge now is moving on from discovery, important though that remains, to characterisation: what are these planets actually like, and why are they as they are? In the past ten years, we have learned how to obtain the first spectra of exoplanets using transit transmission and emission spectroscopy. With the high stability of Spitzer, Hubble, and large ground-based telescopes the spectra of bright close-in massive planets can be obtained and species like water vapour, methane, carbon monoxide and dioxide have been detected. With transit science came the first tangible remote sensing of these planetary bodies and so one can start to extrapolate from what has been learnt from Solar System probes to what one might plan to learn about their faraway siblings. As we learn more about the atmospheres, surfaces and near-surfaces of these remote bodies, we will begin to build up a clearer picture of their construction, history and suitability for life. The Exoplanet Characterisation Observatory, EChO, will be the first dedicated mission to investigate the physics and chemistry of Exoplanetary Atmospheres. By characterising spectroscopically more bodies in different environments we will take detailed planetology out of the Solar System and into the Galaxy as a whole
    corecore