206 research outputs found

    Optimal control as a graphical model inference problem

    Get PDF
    We reformulate a class of non-linear stochastic optimal control problems introduced by Todorov (2007) as a Kullback-Leibler (KL) minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls. We show how this KL control theory contains the path integral control method as a special case. We provide an example of a block stacking task and a multi-agent cooperative game where we demonstrate how approximate inference can be successfully applied to instances that are too complex for exact computation. We discuss the relation of the KL control approach to other inference approaches to control.Comment: 26 pages, 12 Figures; Machine Learning Journal (2012

    Model-based contextual policy search for data-efficient generalization of robot skills

    Get PDF
    In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies

    Learning modular policies for robotics

    Get PDF
    A promising idea for scaling robot learning to more complex tasks is to use elemental behaviors as building blocks to compose more complex behavior. Ideally, such building blocks are used in combination with a learning algorithm that is able to learn to select, adapt, sequence and co-activate the building blocks. While there has been a lot of work on approaches that support one of these requirements, no learning algorithm exists that unifies all these properties in one framework. In this paper we present our work on a unified approach for learning such a modular control architecture. We introduce new policy search algorithms that are based on information-theoretic principles and are able to learn to select, adapt and sequence the building blocks. Furthermore, we developed a new representation for the individual building block that supports co-activation and principled ways for adapting the movement. Finally, we summarize our experiments for learning modular control architectures in simulation and with real robots

    Systematic review and meta-analysis of the diagnostic accuracy of ultrasonography for deep vein thrombosis

    Get PDF
    Background Ultrasound (US) has largely replaced contrast venography as the definitive diagnostic test for deep vein thrombosis (DVT). We aimed to derive a definitive estimate of the diagnostic accuracy of US for clinically suspected DVT and identify study-level factors that might predict accuracy. Methods We undertook a systematic review, meta-analysis and meta-regression of diagnostic cohort studies that compared US to contrast venography in patients with suspected DVT. We searched Medline, EMBASE, CINAHL, Web of Science, Cochrane Database of Systematic Reviews, Cochrane Controlled Trials Register, Database of Reviews of Effectiveness, the ACP Journal Club, and citation lists (1966 to April 2004). Random effects meta-analysis was used to derive pooled estimates of sensitivity and specificity. Random effects meta-regression was used to identify study-level covariates that predicted diagnostic performance. Results We identified 100 cohorts comparing US to venography in patients with suspected DVT. Overall sensitivity for proximal DVT (95% confidence interval) was 94.2% (93.2 to 95.0), for distal DVT was 63.5% (59.8 to 67.0), and specificity was 93.8% (93.1 to 94.4). Duplex US had pooled sensitivity of 96.5% (95.1 to 97.6) for proximal DVT, 71.2% (64.6 to 77.2) for distal DVT and specificity of 94.0% (92.8 to 95.1). Triplex US had pooled sensitivity of 96.4% (94.4 to 97.1%) for proximal DVT, 75.2% (67.7 to 81.6) for distal DVT and specificity of 94.3% (92.5 to 95.8). Compression US alone had pooled sensitivity of 93.8 % (92.0 to 95.3%) for proximal DVT, 56.8% (49.0 to 66.4) for distal DVT and specificity of 97.8% (97.0 to 98.4). Sensitivity was higher in more recently published studies and in cohorts with higher prevalence of DVT and more proximal DVT, and was lower in cohorts that reported interpretation by a radiologist. Specificity was higher in cohorts that excluded patients with previous DVT. No studies were identified that compared repeat US to venography in all patients. Repeat US appears to have a positive yield of 1.3%, with 89% of these being confirmed by venography. Conclusion Combined colour-doppler US techniques have optimal sensitivity, while compression US has optimal specificity for DVT. However, all estimates are subject to substantial unexplained heterogeneity. The role of repeat scanning is very uncertain and based upon limited data

    Overcoming Ostrea edulis seed production limitations to meet ecosystem restoration demands in the UN decade on restoration

    Get PDF
    The European flat oyster, Ostrea edulis, is a habitat-forming bivalve which was historically widespread throughout Europe. Following its decline due to overfishing, pollution, sedimentation, invasive species, and disease, O. edulis and its beds are now listed as a threatened and/or declining species and habitat by OSPAR. Increasing recognition of the plight of the oyster, alongside rapidly developing restoration techniques and growing interest in marine restoration, has resulted in a recent and rapid growth in habitat restoration efforts. O. edulis seed supply is currently a major bottleneck in scaling up habitat restoration efforts in Europe. O. edulis has been cultured for centuries, however, research into its culture declined following the introduction of the Pacific oyster, Crassostrea gigas to Europe in the early 1970 s. Recent efforts to renew both hatchery and pond production of O. edulis seed for habitat restoration purposes are hampered by restoration project timelines and funding typically being short, or projects not planning appropriately for the timescales required for investment, research-and-development and delivery of oyster seed by commercial producers. Furthermore, funding for restoration is intermittent, making long-term commitments between producers and restoration practitioners difficult. Long-term, strategic investment in research and production are needed to overcome these bottlenecks and meet current ambitious restoration targets across Europe

    Recent translational research: Oncogene discovery by insertional mutagenesis gets a new boost

    Get PDF
    Knowledge of the genes and genetic pathways involved in onco-genesis is essential if we are to identify novel targets for cancer therapy. Insertional mutagenesis in mouse models is among the most efficient tools to detect novel cancer genes. Retrovirus-mediated insertional mutagenesis received a tremendous boost by the availability of the mouse genome sequence and new PCR methods. Application of such advances were limited to lympho-magenesis but are now also being applied to mammary tumourigenesis. Novel transposons that allow insertional muta-genesis studies to be conducted in tumors of any mouse tissue may give cancer gene discovery a further boost

    GPs' opinions of public and industrial information regarding drugs: a cross-sectional study

    Get PDF
    Background: General Practitioners {GP} in Sweden prescribe more than 50% of all prescriptions. Scientific knowledge on the opinions of GPs regarding drug information has been sparse. Such knowledge could be valuable when designing evidence-based drug information to GPs. GPs' opinions on public- and industry-provided drug information are presented in this article. Methods: A cross-sectional study using a questionnaire was answered by 368 GPs at 97 primary-health care centres {PHCC}. The centres were invited to participate by eight out of 29 drug and therapeutic committees {DTCs}. A multilevel model was used to analyse associations between opinions of GPs regarding drug information and whether the GPs worked in public sector or in a private enterprise, their age, sex, and work experience. PHCC and geographical area were included as random effects. Results: About 85% of the GPs perceived they received too much information from the industry, that the quality of public information was high and useful, and that the main task of public authorities was to increase the GPs' knowledge of drugs. Female GPs valued information from public authorities to a much greater extent than male GPs. Out of the GPs, 93% considered the main task of the industry was to promote sales. Differences between the GPs' opinions between PHCCs were generally more visible than differences between areas. Conclusions: Some kind of incentives could be considered for PHCCs that actively reduce drug promotion from the industry. That female GPs valued information from public authorities to a much greater extent than male GPs should be taken into consideration when designing evidence-based drug information from public authorities to make implementation easier

    Thermodynamics as a theory of decision-making with information processing costs

    Full text link
    Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we propose an information-theoretic formalization of bounded rational decision-making where decision-makers trade off expected utility and information processing costs. Such bounded rational decision-makers can be thought of as thermodynamic machines that undergo physical state changes when they compute. Their behavior is governed by a free energy functional that trades off changes in internal energy-as a proxy for utility-and entropic changes representing computational costs induced by changing states. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known concepts from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss the relation to satisficing decision-making procedures as well as links to existing theoretical frameworks and human decision-making experiments that describe deviations from expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to axiomatically derive and interpret the thermodynamic free energy as a model of bounded rational decision-making.Comment: 26 pages, 5 figures, (under revision since February 2012
    • 

    corecore