30,307 research outputs found
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
A Dynamic Embedding Model of the Media Landscape
Information about world events is disseminated through a wide variety of news
channels, each with specific considerations in the choice of their reporting.
Although the multiplicity of these outlets should ensure a variety of
viewpoints, recent reports suggest that the rising concentration of media
ownership may void this assumption. This observation motivates the study of the
impact of ownership on the global media landscape and its influence on the
coverage the actual viewer receives. To this end, the selection of reported
events has been shown to be informative about the high-level structure of the
news ecosystem. However, existing methods only provide a static view into an
inherently dynamic system, providing underperforming statistical models and
hindering our understanding of the media landscape as a whole.
In this work, we present a dynamic embedding method that learns to capture
the decision process of individual news sources in their selection of reported
events while also enabling the systematic detection of large-scale
transformations in the media landscape over prolonged periods of time. In an
experiment covering over 580M real-world event mentions, we show our approach
to outperform static embedding methods in predictive terms. We demonstrate the
potential of the method for news monitoring applications and investigative
journalism by shedding light on important changes in programming induced by
mergers and acquisitions, policy changes, or network-wide content diffusion.
These findings offer evidence of strong content convergence trends inside large
broadcasting groups, influencing the news ecosystem in a time of increasing
media ownership concentration
An evolutionary complex systems decision-support tool for the management of operations
Purpose - The purpose of this is to add both to the development of complex systems thinking in the subject area of operations and production management and to the limited number of applications of computational models and simulations from the science of complex systems. The latter potentially offer helpful decision-support tools for operations and production managers.
Design/methodology/approach - A mechanical engineering firm was used as a case study where a combined qualitative and quantitative methodological approach was employed to extract the required data from four senior managers. Company performance measures as well as firm technologies, practices and policies, and their relation and interaction with one another, were elicited. The data were subjected to an evolutionary complex systems (ECS) model resulting in a series of simulations.
Findings - The findings highlighted the effects of the diversity in management decision making on the firm's evolutionary trajectory. The CEO appeared to have the most balanced view of the firm, closely followed by the marketing and research and development managers. The manufacturing manager's responses led to the most extreme evolutionary trajectory where the integrity of the entire firm came into question particularly when considering how employees were utilised.
Research limitations/implications - By drawing directly from the opinions and views of managers, rather than from logical "if-then" rules and averaged mathematical representations of agents that characterise agent-based and other self-organisational models, this work builds on previous applications by capturing a micro-level description of diversity that has been problematical both in theory and application.
Practical implications - This approach can be used as a decision-support tool for operations and other managers providing a forum with which to explore: the strengths, weaknesses and consequences of different decision-making capacities within the firm; the introduction of new manufacturing technologies, practices and policies; and the different evolutionary trajectories that a firm can take.
Originality/value - With the inclusion of "micro-diversity", ECS modelling moves beyond the self-organisational models that populate the literature but has not as yet produced a great many practical simulation results. This work is a step in that direction
Fast Damage Recovery in Robotics with the T-Resilience Algorithm
Damage recovery is critical for autonomous robots that need to operate for a
long time without assistance. Most current methods are complex and costly
because they require anticipating each potential damage in order to have a
contingency plan ready. As an alternative, we introduce the T-resilience
algorithm, a new algorithm that allows robots to quickly and autonomously
discover compensatory behaviors in unanticipated situations. This algorithm
equips the robot with a self-model and discovers new behaviors by learning to
avoid those that perform differently in the self-model and in reality. Our
algorithm thus does not identify the damaged parts but it implicitly searches
for efficient behaviors that do not use them. We evaluate the T-Resilience
algorithm on a hexapod robot that needs to adapt to leg removal, broken legs
and motor failures; we compare it to stochastic local search, policy gradient
and the self-modeling algorithm proposed by Bongard et al. The behavior of the
robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using
only 25 tests on the robot and an overall running time of 20 minutes,
T-Resilience consistently leads to substantially better results than the other
approaches
- …