512 research outputs found

    Mirrors with Regular Hexagonal Segments

    Get PDF
    The point-spread function and emissivity are calculated for a mirror made from regular hexagonal segments of just a few different sizes. A mirror of this type has many similar segments, which is an advantage for manufacturing, and for an ~f/1 mirror with ≥1000 segments and ≥4 sizes of regular hexagons the increase in intersegment gap area is negligible. This result raises the possibility of making a mirror from very large numbers of identical small segments that are warped to the required figure

    A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

    Full text link
    Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach

    Deep reinforcement learning from human preferences

    Full text link
    For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback

    Implicit Gender Bias in the Classroom: Memories from K-12 Education

    Get PDF
    Implicit biases affect everyone in society, including within the K-12 education system. This study investigated what memories of implicit gender bias preservice teachers (PSTs) recalled from their K-12 education. These memories may be connected to the PSTs’ embedded implicit biases and indicate the long-term impact of teachers’ biases on students. A total of 141 undergraduate PSTs from two universities were surveyed regarding gender expectations and recognition of LGBTQ+ people. Results indicated an inconsistency between espoused beliefs and practices within the classrooms. Because schools often reflect society’s norms and perpetuate them through implicit bias, understanding what biases are currently accepted and reinforced in schools allows teacher education programs to unpack these specific biases with their preservice teachers to promote greater equality and ultimately reduce sexism in our society

    Searching for collective behavior in a network of real neurons

    Get PDF
    Maximum entropy models are the least structured probability distributions that exactly reproduce a chosen set of statistics measured in an interacting network. Here we use this principle to construct probabilistic models which describe the correlated spiking activity of populations of up to 120 neurons in the salamander retina as it responds to natural movies. Already in groups as small as 10 neurons, interactions between spikes can no longer be regarded as small perturbations in an otherwise independent system; for 40 or more neurons pairwise interactions need to be supplemented by a global interaction that controls the distribution of synchrony in the population. Here we show that such "K-pairwise" models--being systematic extensions of the previously used pairwise Ising models--provide an excellent account of the data. We explore the properties of the neural vocabulary by: 1) estimating its entropy, which constrains the population's capacity to represent visual information; 2) classifying activity patterns into a small set of metastable collective modes; 3) showing that the neural codeword ensembles are extremely inhomogenous; 4) demonstrating that the state of individual neurons is highly predictable from the rest of the population, allowing the capacity for error correction.Comment: 24 pages, 19 figure
    • …
    corecore