50 research outputs found
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
International audienceThis paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration
Sparse Temporal Difference Learning using LASSO
International audienceWe consider the problem of on-line value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equi-gradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. We advocate our choice of the EGD as a judicious algorithm for these tasks. We present the EGD algorithm in details as well as some experimental results. We insist on the qualities of the EGD for reinforcement learning
The Equi-Correlation Network: a New Kernelized-LARS with Automatic Kernel Parameters Tuning
Machine learning heavily relies on the ability to learn/approximate real functions. State variables, the perceptions, internal states, etc, of an agent are often represented as real numbers; grounded on them, the agent has to predict something, or act in some way. In this view, this outcome is a nonlinear function of the inputs. It is thus a very common task to fit a nonlinear function to observations, namely solving a regression problem. Among other approaches, the LARS is very appealing, for its nice theoretical properties, and actual efficiency to compute the whole regularization path of a supervised learning problem, along with the sparsity. In this paper, we consider the kernelized version of the LARS. In this setting, kernel functions generally have some parameters that have to be tuned. In this paper, we propose a new algorithm, the Equi-Correlation Network (ECON), which originality is that while computing the regularization path, ECON automatically tunes kernel hyper-parameters; thus, this opens the way to working with infinitely many kernel functions, from which, the most interesting are selected. Interestingly, our algorithm is still computationaly efficient, and provide state-of-the-art results on standard benchmarks, while lessening the hand-tuning burden
Equi-Gradient Temporal Difference Learning
Equi-Gradient Temporal Difference Learnin
ECON: a Kernel Basis Pursuit Algorithm with Automatic Feature Parameter Tuning, and its Application to Photometric Solids Approximation
International audienceThis paper introduces a new algorithm, namely the Equi-Correlation Network (ECON), to perform supervised classification, and regression. ECON is a kernelized LARS-like algorithm, by which we mean that ECON uses an regularization to produce sparse estimators, ECON efficiently rides the regularization path to obtain the estimator associated to any regularization constant values, and ECON represents the data by way of features induced by a feature function. The originality of ECON is that it automatically tunes the parameters of the features while riding the regularization path. So, ECON has the unique ability to produce optimally tuned features for each value of the constant of regularization. We illustrate the remarkable experimental performance of ECON on standard benchmark datasets; we also present a novel application of machine learning in the field of computer graphics, namely the approximation of photometric solids
The Iso-regularization Descent Algorithm for the LASSO
International audienceFollowing the introduction by Tibshirani of the LASSO technique for feature selection in regression, two algorithms were proposed by Osborne et al. for solving the associated problem. One is an homotopy method that gained popularity as the LASSO modification of the LARS algorithm. The other is a finite-step descent method that follows a path on the constraint polytope, and seems to have been largely ignored. One of the reason may be that it solves the constrained formulation of the LASSO, as opposed to the more practical regularized formulation. We give here an adaptation of this algorithm that solves the regularized problem, has a simpler formulation, and outperforms state-of-the-art algorithms in terms of speed
Hybridizing Constraint Programming and Monte-Carlo Tree Search: Application to the Job Shop problem
International audienceConstraint Programming (CP) solvers classically explore the solution space using tree search-based heuristics. Monte-Carlo Tree-Search (MCTS), a tree-search based method aimed at sequential decision making under uncertainty, simultaneously estimates the reward associated to the sub-trees, and gradually biases the exploration toward the most promising regions. This paper examines the tight combination of MCTS and CP on the job shop problem (JSP). The contribution is twofold. Firstly, a reward function compliant with the CP setting is proposed. Secondly, a biased MCTS node-selection rule based on this reward is proposed, that is suitable in a multiple-restarts context. Its integration within the Gecode constraint solver is shown to compete with JSP-specific CP approaches on difficult JSP instances
Confidence in uncertainty: Error cost and commitment in early speech hypotheses
© 2018 Loth et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Interactions with artificial agents often lack immediacy because agents respond slower than their users expect. Automatic speech recognisers introduce this delay by analysing a user’s utterance only after it has been completed. Early, uncertain hypotheses of incremental speech recognisers can enable artificial agents to respond more timely. However, these hypotheses may change significantly with each update. Therefore, an already initiated action may turn into an error and invoke error cost. We investigated whether humans would use uncertain hypotheses for planning ahead and/or initiating their response. We designed a Ghost-in-the-Machine study in a bar scenario. A human participant controlled a bartending robot and perceived the scene only through its recognisers. The results showed that participants used uncertain hypotheses for selecting the best matching action. This is comparable to computing the utility of dialogue moves. Participants evaluated the available evidence and the error cost of their actions prior to initiating them. If the error cost was low, the participants initiated their response with only suggestive evidence. Otherwise, they waited for additional, more confident hypotheses if they still had time to do so. If there was time pressure but only little evidence, participants grounded their understanding with echo questions. These findings contribute to a psychologically plausible policy for human-robot interaction that enables artificial agents to respond more timely and socially appropriately under uncertainty
Iconografia tropical: motivos locais na arte colonial brasileira
Este artigo estuda a representação visual da natureza tropical na arte sacra do período colonial brasileiro, entre os séculos XVI e XVIII, época em que as artes visuais do país se desenvolveram no contexto do barroco introduzido pelos missionários católicos. Foi na decoração das igrejas que apareceram algumas das primeiras representações artísticas de elementos da natureza local, notadamente as frutas tropicais, produzindo novas combinações junto à tradicional ornamentação fitomórfica europeia, constituída de folhas de acantos e vinhas. Após um levantamento das ocorrências dessas manifestações da temática local na decoração dos templos presentes nas regiões nordeste e sudeste do país, este trabalho aborda, nos textos dos viajantes e missionários produzidos no período, as interpretações cristãs da natureza tropical que permitiram o aproveitamento desses motivos como parte da estratégia de pregação e conversão católica por meio da alegorização moral e religiosa da natureza do Novo Mundo.This paper studies the visual representation of local nature in the sacred art developed during the colonial period of Brazilian history. In this period, between the XVIth and the XVIIIth centuries, the visual arts in the country evolved in the context of the Baroque introduced by Catholic missionaries. It was in the decoration of the churches in which the first representations of aspects of local nature, mostly the tropical fruits, appeared in Brazilian visual arts, producing new combinations together with the traditional European phytomorphic ornamentation of acanthus leaves and grapes. This research draws upon texts written by travellers and missionaries during the period to demonstrate how the Europeans interpreted and represented tropical nature and used these representations as part of the Catholic preaching strategy by means of moral and religious allegorization of the New World nature