Search CORE

1,073 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

ATTac-2000: An Adaptive Autonomous Bidding Agent

Author: Kearns M.
Littman M. L.
Singh S.
Stone P.
Publication venue: 'AI Access Foundation'
Publication date: 03/06/2011
Field of study

The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding strategy that includes several elements of adaptivity. In addition to the success at the competition, isolated empirical results are presented indicating the robustness and effectiveness of ATTac-2000's adaptive strategy

arXiv.org e-Print Archive

Crossref

Create a translational medicine knowledge repository - Research downsizing, mergers and increased outsourcing have reduced the depth of in-house translational medicine expertise and institutional memory at many pharmaceutical and biotech companies: how will they avoid relearning old lessons?

Author: B Gertz
BH Littman
Bruce H Littman
C Arnst
Francesco M Marincola
J-P Garnier
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Pharmaceutical industry consolidation and overall research downsizing threatens the ability of companies to benefit from their previous investments in translational research as key leaders with the most knowledge of the successful use of biomarkers and translational pharmacology models are laid off or accept their severance packages. Two recently published books may help to preserve this type of knowledge but much of this type of information is not in the public domain. Here we propose the creation of a translational medicine knowledge repository where companies can submit their translational research data and access similar data from other companies in a precompetitive environment. This searchable repository would become an invaluable resource for translational scientists and drug developers that could speed and reduce the cost of new drug development

Crossref

Springer - Publisher Connector

PubMed Central

Heritable Gene Regulation in the CD4:CD8 T Cell Lineage Choice

Author: Charles P. Ng
Dan R. Littman
Priya D. A. Issuree
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Frontiers - Publisher Connector

Virus-host interactions: new insights from the small RNA world

Author: Browne Edward P
Chong Mark
Li Junjie
Littman Dan R
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

RNA silencing has a known role in the antiviral responses of plants and insects. Recent evidence, including the finding that the Tat protein of human immunodeficiency virus (HIV) can suppress the host's RNA-silencing pathway and may thus counteract host antiviral RNAs, suggests that RNA-silencing pathways could also have key roles in mammalian virus-host interactions

CiteSeerX

PubMed Central

Carolina Digital Repository

University of Melbourne Institutional Repository

Hall-Effect for Neutral Atoms

Author: A. A. Belov
A. M. Dykhne
A. P. Kasantsev
D. Delande
F. J. Dyson
G. R. Welch
M. G. Littman
V. L. Pokrovsky
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 04/04/1996
Field of study

It is shown that polarizable neutral systems can drift in crossed magnetic and electric fileds. The drift velocity is perpendicular to both fields, but contrary to the drif t velocity of a charged particle, it exists only, if fields vary in space or in time. We develop an adiabatic theory of this phenomenon and analyze conditions of its experimental observation. The most proper objects for the observation of this effect are Rydberg atoms. It can be applied for the separation of excited atoms.Comment: RevTex, 4 pages; to be published in Pis'ma v ZhET

arXiv.org e-Print Archive

Crossref

Power systems for future missions

Author: Frye P. E.
Gill S. P.
Littman Franklin D.
Meisl C. J.
Publication venue
Publication date
Field of study

A comprehensive scenario of future missions was developed and applicability of different power technologies to these missions was assessed. Detailed technology development roadmaps for selected power technologies were generated. A simple methodology to evaluate economic benefits of current and future power system technologies by comparing Life Cycle Costs of potential missions was developed. The methodology was demonstrated by comparing Life Cycle Costs for different implementation strategies of DIPS/CBC technology to a selected set of missions

NASA Technical Reports Server

The RNAseIII enzyme Drosha is critical in T cells for preventing lethal inflammatory disease

Author: Chong Mark M.W.
Littman Dan R.
Rasmussen Jeffrey P.
Rudensky Alexander Y.
Publication venue: The Rockefeller University Press
Publication date: 01/09/2008
Field of study

MicroRNAs (miRNAs) are implicated in the differentiation and function of many cell types. We provide genetic and in vivo evidence that the two RNaseIII enzymes, Drosha and Dicer, do indeed function in the same pathway. These have previously been shown to mediate the stepwise maturation of miRNAs (Lee, Y., C. Ahn, J. Han, H. Choi, J. Kim, J. Yim, J. Lee, P. Provost, O. Radmark, S. Kim, and V.N. Kim. 2003. Nature. 425:415–419), and genetic ablation of either within the T cell compartment, or specifically within Foxp3+ regulatory T (T reg) cells, results in identical phenotypes. We found that miRNA biogenesis is indispensable for the function of T reg cells. Specific deletion of either Drosha or Dicer phenocopies mice lacking a functional Foxp3 gene or Foxp3+ cells, whereas deletion throughout the T cell compartment also results in spontaneous inflammatory disease, but later in life. Thus, miRNA-dependent regulation is critical for preventing spontaneous inflammation and autoimmunity

Crossref

PubMed Central

University of Melbourne Institutional Repository

False-Name Manipulation in Weighted Voting Games is Hard for Probabilistic Polynomial Time

Author: D. Felsenthal
E. Elkind
H. Aziz
H. Hunt
J. Banzhaf III
J. Gill
K. Prasad
K. Wagner
L. Penrose
L. Shapley
L. Valiant
M. Littman
M. Mundhenk
P. Dubey
P. Faliszewski
S. Toda
Y. Bachrach
Publication venue
Publication date: 07/03/2013
Field of study

False-name manipulation refers to the question of whether a player in a weighted voting game can increase her power by splitting into several players and distributing her weight among these false identities. Analogously to this splitting problem, the beneficial merging problem asks whether a coalition of players can increase their power in a weighted voting game by merging their weights. Aziz et al. [ABEP11] analyze the problem of whether merging or splitting players in weighted voting games is beneficial in terms of the Shapley-Shubik and the normalized Banzhaf index, and so do Rey and Rothe [RR10] for the probabilistic Banzhaf index. All these results provide merely NP-hardness lower bounds for these problems, leaving the question about their exact complexity open. For the Shapley--Shubik and the probabilistic Banzhaf index, we raise these lower bounds to hardness for PP, "probabilistic polynomial time", and provide matching upper bounds for beneficial merging and, whenever the number of false identities is fixed, also for beneficial splitting, thus resolving previous conjectures in the affirmative. It follows from our results that beneficial merging and splitting for these two power indices cannot be solved in NP, unless the polynomial hierarchy collapses, which is considered highly unlikely

arXiv.org e-Print Archive

Crossref

Learning Mazes with Aliasing States: An LCS Algorithm with Associative Perception

Author: Anthony Bagnall
Bull L.
Bull L.
Butz M.V.
Butz M.V.
Cassandra A.R.
Gerard P.
Hoffman J.
Holland J.H.
Holmes M.
Hurst J.
Lanzi P.L.
Lanzi P.L.
Lanzi P.L.
Littman M.L.
Littman M.L.
McCallum A.R.
Miyazaki K.
Métivier M.
Nevison C.
O'Hara T.
Pavlov I.P.
Pear J.
Russell S.
Skinner B.F.
Studley M.
Sutton R.S.
Thorndike E.L.
Zatuchna Z.V.
Zatuchna Z.V.
Zatuchna Z.V.
Zhanna V. Zatuchna
Publication venue: 'SAGE Publications'
Publication date: 09/03/2009
Field of study

Learning classifier systems (LCSs) belong to a class of algorithms based on the principle of self-organization and have frequently been applied to the task of solving mazes, an important type of reinforcement learning (RL) problem. Maze problems represent a simplified virtual model of real environments that can be used for developing core algorithms of many real-world applications related to the problem of navigation. However, the best achievements of LCSs in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons of failure. We construct a new LCS agent that has a simpler and more transparent performance mechanism, but that can still solve mazes better than existing algorithms. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability of associative perception, adopted from psychology. To improve our understanding of the nature and structure of maze environments, we analyze mazes used in research for the last two decades, introduce a set of maze complexity characteristics, and develop a set of new maze environments. We then run our new LCS with associative perception through the old and new aliasing mazes, which represent partially observable Markov decision problems (POMDP) and demonstrate that it performs at least as well as, and in some cases better than, other published systems

Crossref

University of East Anglia digital repository