1,065 research outputs found

    Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

    Full text link
    The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.Comment: 15 pages, 2 figures, submitted to ALT (Algorithmic Learning Theory

    The Innermost Ejecta of Core Collapse Supernovae

    Full text link
    We ensure successful explosions (of otherwise non-explosive models) by enhancing the neutrino luminosity via reducing the neutrino scattering cross sections or by increasing the heating efficiency via enhancing the neutrino absorption cross sections in the heating region. Our investigations show that the resulting electron fraction Ye in the innermost ejecta is close to 0.5, in some areas even exceeding 0.5. We present the effects of the resulting values for Ye on the nucleosynthesis yields of the innermost zones of core collapse supernovae.Comment: 4pages, 2figures; contribution to Nuclei In The Cosmos VIII, to appear in Nucl. Phys.

    What makes medical students better listeners?

    Get PDF
    Diagnosing heart conditions by auscultation is an important clinical skill commonly learnt by medical students. Clinical proficiency for this skill is in decline [1], and new teaching methods are needed. Successful discrimination of heartbeat sounds is believed to benefit mainly from acoustical training [2]. From recent studies of auditory training [3,4] we hypothesized that semantic representations outside the auditory cortex contribute to diagnostic accuracy in cardiac auscultation. To test this hypothesis, we analysed auditory evoked potentials (AEPs) which were recorded from medical students while they diagnosed quadruplets of heartbeat cycles. The comparison of trials with correct (Hits) versus incorrect diagnosis (Misses) revealed a significant difference in brain activity at 280-310 ms after the onset of the second cycle within the left middle frontal gyrus (MFG) and the right prefrontal cortex. This timing and locus suggest that semantic rather than acoustic representations contribute critically to auscultation skills. Thus, teaching auscultation should emphasize the link between the heartbeat sound and its meaning. Beyond cardiac auscultation, this issue is of interest for all fields where subtle but complex perceptual differences identify items in a well-known semantic context

    A two-armed bandit based scheme for accelerated decentralized learning

    Get PDF
    The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, QoS control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making

    The Neutrino Signal in Stellar Core Collapse and Postbounce Evolution

    Get PDF
    General relativistic multi-group and multi-flavor Boltzmann neutrino transport in spherical symmetry adds a new level of detail to the numerical bridge between microscopic nuclear and weak interaction physics and the macroscopic evolution of the astrophysical object. Although no supernova explosions are obtained, we investigate the neutrino luminosities in various phases of the postbounce evolution for a wide range of progenitor stars between 13 and 40 solar masses. The signal probes the dynamics of material layered in and around the protoneutron star and is, within narrow limits, sensitive to improvements in the weak interaction physics. Only changes that dramatically exceed physical limitations allow experiments with exploding models. We discuss the differences in the neutrino signal and find the electron fraction in the innermost ejecta to exceed 0.5 as a consequence of thermal balance and weak equilibrium at the masscut.Comment: 8 pages, 4 figures. Proceedings of the Nuclear Physics in Astrophysics Conference, Debrecen, Hungary, 2002, to appear in Nuc. Phys. A. Color figures added and reference actualize

    Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

    Get PDF
    The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions

    Discretized Bayesian pursuit – A new scheme for reinforcement learning

    Get PDF
    The success of Learning Automata (LA)-based estimator algorithms over the classical, Linear Reward-Inaction ( L RI )-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the L RI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) [1]. The key innovation is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date

    Differential iron requirements for osteoblast and adipocyte differentiation

    Get PDF
    Bone marrow mesenchymal progenitor cells are precursors for various cell types including osteoblasts, adipocytes, and chondrocytes. The external environment and signals act to direct the pathway of differentiation. Importantly, situations such as aging and chronic kidney disease display alterations in the balance of osteoblast and adipocyte differentiation, adversely affecting bone integrity. Iron deficiency, which can often occur during aging and chronic kidney disease, is associated with reduced bone density. The purpose of this study was to assess the effects of iron deficiency on the capacity of progenitor cell differentiation pathways. Mouse and human progenitor cells, differentiated under standard osteoblast and adipocyte protocols in the presence of the iron chelator deferoxamine (DFO), were used. Under osteogenic conditions, 5μM DFO significantly impaired expression of critical osteoblast genes, including osteocalcin, type 1 collagen, and dentin matrix protein 1. This led to a reduction in alkaline phosphatase activity and impaired mineralization. Despite prolonged exposure to chronic iron deficiency, cells retained viability as well as normal hypoxic responses with significant increases in transferrin receptor and protein accumulation of hypoxia inducible factor 1α. Similar concentrations of DFO were used when cells were maintained in adipogenic conditions. In contrast to osteoblast differentiation, DFO modestly suppressed adipocyte gene expression of peroxisome-proliferating activated receptor gamma, lipoprotein lipase, and adiponectin at earlier time points with normalization at later stages. Lipid accumulation was also similar in all conditions. These data suggest the critical importance of iron in osteoblast differentiation, and as long as the external stimuli are present, iron deficiency does not impede adipogenesis. © 2021 The Authors. JBMR Plus published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research.Daniel F. Edwards III, Christopher J. Miller, Arelis Quintana-Martinez, Christian S. Wright, Matthew Prideaux, Gerald J. Atkins, William R. Thompson, and Erica L. Clinkenbear

    Vortex dynamics and states of artificially layered superconducting films with correlated defects

    Full text link
    Linear resistances and IVIV-characteristics have been measured over a wide range in the parameter space of the mixed phase of multilayered a-TaGe/Ge films. Three films with varying interlayer coupling and correlated defects oriented at an angle 25\approx 25 from the film normal were investigated. Experimental data were analyzed within vortex glass models and a second order phase transition from a resistive vortex liquid to a pinned glass phase. Various vortex phases including changes from three to two dimensional behavior depending on anisotropy have been identified. Careful analysis of IVIV-characteristics in the glass phases revealed a distinctive TT and HH-dependence of the glass exponent μ\mu. The vortex dynamics in the Bose-glass phase does not follow the predicted behavior for excitations of vortex kinks or loops.Comment: 16 pages, 10 figures, 3 table

    Nucleosynthesis in Neutrino-Driven Supernovae

    Full text link
    Core collapse supernovae are the leading actor in the story of the cosmic origin of the chemical elements. Existing models, which generally assume spherical symmetry and parameterize the explosion, have been able to broadly replicate the observed elemental pattern. However, inclusion of neutrino interactions produces noticeable improvement in the composition of the ejecta when compared to observations. Neutrino interactions may also provide a supernova source for light p-process nuclei.Comment: 7 pages, 2 figures, in proceedings of Astronomy with Radioactivities V, Clemson University, September 5-9, 2005, to appear in New Astronomy Review
    corecore