Search CORE

693 research outputs found

Mirrors with Regular Hexagonal Segments

Author: Amodei Dario
Padin Stephen
Publication venue: Optical Society of America
Publication date: 01/09/2003
Field of study

The point-spread function and emissivity are calculated for a mirror made from regular hexagonal segments of just a few different sizes. A mirror of this type has many similar segments, which is an advantage for manufacturing, and for an ~f/1 mirror with ≥1000 segments and ≥4 sizes of regular hexagons the increase in intersegment gap area is negligible. This result raises the possibility of making a mirror from very large numbers of identical small segments that are warped to the required figure

Caltech Authors

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Author: amodei
chebotar
levine
levine
mnih
montgomery
silver
stulp
Publication venue
Publication date: 01/01/2017
Field of study

Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Searching for collective behavior in a network of real neurons

Author: Amodei Dario
Berry II Michael J
Bialek William
Marre Olivier
Schneidman Elad
Tkačik Gašper
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/06/2013
Field of study

Maximum entropy models are the least structured probability distributions that exactly reproduce a chosen set of statistics measured in an interacting network. Here we use this principle to construct probabilistic models which describe the correlated spiking activity of populations of up to 120 neurons in the salamander retina as it responds to natural movies. Already in groups as small as 10 neurons, interactions between spikes can no longer be regarded as small perturbations in an otherwise independent system; for 40 or more neurons pairwise interactions need to be supplemented by a global interaction that controls the distribution of synchrony in the population. Here we show that such "K-pairwise" models--being systematic extensions of the previously used pairwise Ising models--provide an excellent account of the data. We explore the properties of the neural vocabulary by: 1) estimating its entropy, which constrains the population's capacity to represent visual information; 2) classifying activity patterns into a small set of metastable collective modes; 3) showing that the neural codeword ensembles are extremely inhomogenous; 4) demonstrating that the state of individual neurons is highly predictable from the rest of the population, allowing the capacity for error correction.Comment: 24 pages, 19 figure

Princeton University Open Access Repository

ISTA Research Explorer (Institute of Science and Technology Austria)

Directory of Open Access Journals

Hal-Diderot

The Francis Crick Institute

arXiv.org e-Print Archive

Crossref

HAL-Inserm

PubMed Central

IST Austria: PubRep (Institute of Science and Technology)

HAL: Hyper Article en Ligne

Deep reinforcement learning from human preferences

Author: Amodei Dario
Brown Tom B.
Christiano Paul
Legg Shane
Leike Jan
Martic Miljan
Publication venue
Publication date: 13/07/2017
Field of study

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback

arXiv.org e-Print Archive

The SAO and Kelvin waves in the EuroGRIPS GCMS and the UK Met. Office analyses

Author: A. A. Scaife
D. M. Li
M. Amodei
M. Amodei
P. Simon
S. Pawson
S. Pawson
U. Langematz
W. Lahoz
Publication venue: European Geosciences Union
Publication date: 01/01/2001
Field of study

International audienceWe compare the tropical oscillations and planetary scale Kelvin waves in four troposphere-stratosphere climate models and the assimilated dataset produced by the United Kingdom Meteorological Office (UKMO). The comparison has been made in the GRIPS framework "GCM-Reality Intercomparison Project for SPARC", where SPARC is Stratospheric Processes and their Role in Climate, a project of the World Climate Research Program. The four models evaluated are European members of GRIPS: the UKMO Unified Model (UM), the model of the Free University in Berlin (FUB–GCM), the ARPEGE-climat model of the French National Centre for Meteorological Research (CNRM), and the Extended UGAMP GCM (EUGCM) of the Centre for Global Atmospheric Modelling (CGAM). The integrations were performed with different, but annually periodic external conditions (e.g., sea-surface temperature, sea ice, and incoming solar radiation). The structure of the tropical winds and the strengths of the Kelvin waves are examined. In the analyses where the SAO (Semi-Annual Oscillation) and the QBO (Quasi-Biennal Oscillation) are reasonably well captured, the amplitude of these analysed Kelvin waves is close to that observed in independent data from UARS (Upper Atmosphere Research Satellite). In agreement with observations, the Kelvin waves generated in the models propagate into the middle atmosphere as wave packets, consistent with a convective forcing origin. In three of the models, slow Kelvin waves propagate too high and their amplitudes are overestimated in the upper stratosphere and in the mesosphere, the exception is the UM which has weaker waves. None of the modelled waves are sufficient to force realistic eastward phases of the QBO or SAO. Although the SAO is represented by all models, only two of them are able to generate westerlies between 10 hPa and 50 hPa. The importance of the role played in the SAO by unresolved gravity waves is emphasized. Although it exhibits some unrealistic features, the EUGCM, which includes a parametrization of gravity waves with a non-zero phase speed, is able to simulate clear easterly to westerly transitions as well as westerlies with down-ward propagation. Thermal damping is also important for the westerly forcing in the stratosphere

Crossref

Directory of Open Access Journals

HAL-INSU

HAL-IRD

HAL: Hyper Article en Ligne