988 research outputs found

    Learning with Options that Terminate Off-Policy

    Full text link
    A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(\beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(\beta) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.Comment: AAAI 201

    Distribution and Abundance of Molluscs in a Fresh Water Environment

    Get PDF
    The molluscs of o northwestern Minnesota lake were sampled using transects, a sampling frame, and SCUBA. The species sampled were: Amnicola limosa, Valvata tricarinata, Gyraulus parvus, Physa cf. P. gyrina, Helisoma anceps, H. campanulata, Promenetus exacuous, Ferrissia parallela, Anodonta marginata, Lampsilis siliquoidea, and Sphaerium cf. S. striatum. The unionid clams, the adult Helisoma spp. and the Physa adults were associated with the absence of aquatic vegetation. Distinct associations of snail species were found with each plant association: G. parvus and V. tricarinata with the deep water Nitella opaca association, A. limosa and V. tricarinata with the mid-depth mixed macrophyte association, and H. anceps and Physa adults with the shallow water, rocky bottom macrophyte association. Two estimates of snail abundance were made for each depth. Neither proved very satisfactory, but they did indicate that snail abundance was related to the type and abundance of aquatic plants

    Post-conflict reconciliation and development in Nicaragua: The role of cooperatives and collective action

    Full text link
    This paper examines how cooperatives affected and were affected by the profound political, economic and social transitions that have occurred in Nicaragua in recent decades. It pays particular attention to the shift from the post-revolutionary Sandinista regime of the 1980s to the "neoliberal" regime of the 1990s and early 2000s. In the early 1990s, a peace accord ended years of civil war and the Sandinista government was voted out of office by a coalition of Centrist and Right-wing parties. This meant that policies supporting state and cooperative forms of production were replaced by those favouring privatization, the rolling back of the state and the freeing up of market forces. Cooperatives and the agrarian reform process initiated by the Sandinista government were heavily impacted by this process, often in contradictory ways. Land redistribution to landless peasant farmers and cooperative organizations continued as part of the process of peace-building prior to the elections. Demobilized military and other security personnel were given land after the elections. Workers in state-owned farms and agro-industrial enterprises also acquired assets when part of the state sector was converted to worker-owned and managed enterprises. But the neoliberal era ushered in a process of decollectivization and dispossession and heavily constrained access to credit and support services for cooperatives and small-scale farmers. Agricultural workers and producers were not passive bystanders in this process. Their responses conformed to a Polanyian-type "double movement" where societal forces mobilize in myriad ways to protect against the negative social effects of economic liberalization and the dominance of market forces. The pro-market strand of the double movement centred not only on economic liberalization but also an agrarian counter-reform centred on decollectivization and returning lands to former owners. The societal reaction or "protective" strand of the double movement consisted of diverse forms of contestation, collective action and social innovation. Divided in three parts, this paper first outlines the rapid rise of the cooperative sector and its strengths and weaknesses during the post-revolutionary period from 1979 to the electoral defeat of the Sandinistas in 1990. Part 2 examines the uneven trajectory of agrarian reform and cooperative development during the neoliberal 1990s, consisting of counter reform and ongoing redistribution to the landless. Part 3 examines four manifestations of the "double movement" by agricultural workers and producers. They include (i) the proliferation of civil and armed resistance in the early 1990s; (ii) the structuring of a cooperative movement; (iii) efforts to empower small coffee producers via the fair trade movement and the "quality revolution" and (iv) the drive to reactivate the smallholdings of poor rural women and organize them in pre-cooperative groups. A concluding section distils the main findings for the addressing the challenge of post-conflict reconciliation and development, and refers briefly to the implications for the cooperative movement of the return to power of the Sandinista National Liberation Front in 2007. The main policy lesson for governments engaged in processes of peace-building and "post-conflict" reconstruction would seem to be: ignore the issue of inclusive agrarian development at your peril! If a disabling policy environment exists, and if demands for land and employment on the part of subaltern groups are not met, various forms of resistance will ensue, with the possibility of renewed violent conflict and the inability to govern effectively. And when a political party seemingly supportive of the cooperative sector regains the reins of power, renewed support may come at the cost of dependency and loss of autonomy of the cooperative movement

    OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

    Full text link
    Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.Comment: Accepted to the Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 201

    Density mapping with weak lensing and phase information

    Get PDF
    The available probes of the large scale structure in the Universe have distinct properties: galaxies are a high resolution but biased tracer of mass, while weak lensing avoids such biases but, due to low signal-to-noise ratio, has poor resolution. We investigate reconstructing the projected density field using the complementarity of weak lensing and galaxy positions. We propose a maximum-probability reconstruction of the 2D lensing convergence with a likelihood term for shear data and a prior on the Fourier phases constructed from the galaxy positions. By considering only the phases of the galaxy field, we evade the unknown value of the bias and allow it to be calibrated by lensing on a mode-by-mode basis. By applying this method to a realistic simulated galaxy shear catalogue, we find that a weak prior on phases provides a good quality reconstruction down to scales beyond l=1000, far into the noise domain of the lensing signal alone.Comment: 11 pages, 9 figures, published in MNRA

    A Test of Risk Vulnerability in the Wider Population

    Get PDF
    Panel data from the German SOEP is used to test for risk vulnerability (RV) in the wider population. Two different survey responses are analysed: the response to the question about willingness-to-take risk in general and the chosen investment in a hypothetical lottery. A convenient indicator of background risk is the VDAX index, an established measure of volatility in the German stock market. This is used as an explanatory variable in conjunction with HDAX, the stock market index, which proxies wealth. The impacts of these measures on risk attitude are identifiable by exploiting the time dimension of the panel and matching survey months with corresponding observations from these time-varying factors. Both of the survey responses allow us to test for decreasing absolute risk aversion (DARA); in one case, we find strong evidence of DARA, while in the other, we do not. Both survey responses also allow us to test for RV, and in both cases we find strong evidence. In the case of the hypothetical lottery response, we are also able to estimate a “coefficient of risk vulnerability” (CRV). This is defined as the absolute amount by which absolute risk aversion rises in response to a doubling of background risk. We estimate CRV to be between 1.03 and 1.27
    • …
    corecore