2,784 research outputs found

    Deep reinforcement learning from human preferences

    Full text link
    For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback

    Three ways to compute multiport inertance

    Get PDF
    International audienceThe immediate impulse-response of a confined incompressible fluid is characterized by inertance. For a vessel with inlet and outlet, this is a single quantity; for multiple ports the generalization is a singular reciprocal inertance matrix, acting on the port-impulses to give the corresponding inflows. The coefficients are defined by the boundary-fluxes of potential flows. Green's identity converts these to domain integrals of kinetic energy. If the system is discretized with finite elements, a third method is proposed which requires only the stiffness matrix and the solution vectors and no numerical differentiation

    Seeing with sound? Exploring different characteristics of a visual-to-auditory sensory substitution device

    Get PDF
    Sensory substitution devices convert live visual images into auditory signals, for example with a web camera (to record the images), a computer (to perform the conversion) and headphones (to listen to the sounds). In a series of three experiments, the performance of one such device (‘The vOICe’) was assessed under various conditions on blindfolded sighted participants. The main task that we used involved identifying and locating objects placed on a table by holding a webcam (like a flashlight) or wearing it on the head (like a miner’s light). Identifying objects on a table was easier with a hand-held device, but locating the objects was easier with a head-mounted device. Brightness converted into loudness was less effective than the reverse contrast (dark being loud), suggesting that performance under these conditions (natural indoor lighting, novice users) is related more to the properties of the auditory signal (ie the amount of noise in it) than the cross-modal association between loudness and brightness. Individual differences in musical memory (detecting pitch changes in two sequences of notes) was related to the time taken to identify or recognise objects, but individual differences in self-reported vividness of visual imagery did not reliably predict performance across the experiments. In general, the results suggest that the auditory characteristics of the device may be more important for initial learning than visual associations
    • …
    corecore