840 research outputs found

    Optimizing the depth and the direction of prospective planning using information values

    Get PDF
    Evaluating the future consequences of actions is achievable by simulating a mental search tree into the future. Expanding deep trees, however, is computationally taxing. Therefore, machines and humans use a plan-until-habit scheme that simulates the environment up to a limited depth and then exploits habitual values as proxies for consequences that may arise in the future. Two outstanding questions in this scheme are “in which directions the search tree should be expanded?”, and “when should the expansion stop?”. Here we propose a principled solution to these questions based on a speed/accuracy tradeoff: deeper expansion in the appropriate directions leads to more accurate planning, but at the cost of slower decision-making. Our simulation results show how this algorithm expands the search tree effectively and efficiently in a grid-world environment. We further show that our algorithm can explain several behavioral patterns in animals and humans, namely the effect of time-pressure on the depth of planning, the effect of reward magnitudes on the direction of planning, and the gradual shift from goal-directed to habitual behavior over the course of training. The algorithm also provides several predictions testable in animal/human experiments

    Drugs, police inefficiencies and gangsterism in violently impoverished communities like Overcome

    Get PDF
    This research establishes an understanding of the relationship between gangsterism, the drug commodity and inefficiencies in the state’s policing institution, as well as the consequences of this relationship, in the context of Overcome squatter area in Cape Town. Overcome is representative of other violently impoverished Cape Town communities with its high rate of unemployment, low quality of education, domestic abuse, stagnant housing crisis, lack of access to intellectual and material resources or opportunities for personal growth, gangsterism, inefficient policing, substance-dependency, and violence. This research demonstrates that the current relationship between the gangs, drugs and the police fosters an unpredictable, violent environment, leaving residents in a constant state of vulnerability. The argument is developed around three key historical junctures in the development of organized crime in South Africa, starting with the growth of the mining industry in the Witwatersrand after 1886, followed by forced removals and prohibition like policies in Cape Town circa 1970, and finally the upheaval created around transition away from apartheid in 1994. Research for this paper was both quantitative and qualitative in nature, and included expert interviews on the subjects of police criminality, narcotic sales, and gangsterism. Newspapers articles, crime statistics, books, census figures, and a host of journals were also utilized. Upon reviewing a host of police inefficiencies and criminal collusions, the research concludes that public criminals related to the state, such as police, and private criminals, such as gangsters, work together in a multitude of ways in a bid to acquire wealth, most notably through an illicit drug market today dominated by ‘tik’. It is shown that this violent narcotics market binds police and gangsters together at the expense of creating a state of insecurity for those living in poor drug markets

    Value Driven Representation for Human-in-the-Loop Reinforcement Learning

    Full text link
    Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines
    corecore