11,258 research outputs found

    Learning policies for Markov decision processes from data

    Full text link
    We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example

    Learning policies for Markov decision processes from data

    Full text link
    We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example

    A Game of Attribute Decomposition for Software Architecture Design

    Full text link
    Attribute-driven software architecture design aims to provide decision support by taking into account the quality attributes of softwares. A central question in this process is: What architecture design best fulfills the desirable software requirements? To answer this question, a system designer needs to make tradeoffs among several potentially conflicting quality attributes. Such decisions are normally ad-hoc and rely heavily on experiences. We propose a mathematical approach to tackle this problem. Game theory naturally provides the basic language: Players represent requirements, and strategies involve setting up coalitions among the players. In this way we propose a novel model, called decomposition game, for attribute-driven design. We present its solution concept based on the notion of cohesion and expansion-freedom and prove that a solution always exists. We then investigate the computational complexity of obtaining a solution. The game model and the algorithms may serve as a general framework for providing useful guidance for software architecture design. We present our results through running examples and a case study on a real-life software project.Comment: 23 pages, 5 figures, a shorter version to appear at 12th International Colloquium on Theoretical Aspects of Computing (ICTAC 2015

    An ecological analysis of secondary school students' drug use in Hong Kong: A case-control study

    Get PDF
    Background: Youth drug use is a significant at-risk youth behaviour and remains as one of the top priorities for mental health services, researchers and policy planners. The ecological characteristics of secondary school students’ behaviour in Hong Kong are understudied. Aim: To examine individual, familial, social and environmental correlates of drug use among secondary students in Hong Kong. Method: Data were extracted from a school survey with 3078 students. Among the 3078 students, 86 students reported to have used drugs in the past 6 months. A total of 86 age- and gender-matched controls with no drug-use behaviour in the past 6 months were randomly selected from the remaining students. Multiple logistic analysis was used to examine differential correlates between those who used and did not use substance in the past 6 months. Result: Positive school experience and perspective to school and parental support are protective factors of drug use. Lower self-esteem, lower self-efficacy against using drugs and higher level of permissive attitude towards drugs were associated with drug use. Students who were low in self-esteem and rather impulsive tend to use drugs. Conclusion: To prevent students from drug use, efforts in individual, family, school and community-levels should be addressed.postprin

    Treatment time for non-surgical endodontic therapy with or without a magnifying loupe

    Get PDF
    published_or_final_versio

    Mm-wave high gain cavity-backed aperture-coupled patch antenna array

    Full text link
    © 2013 IEEE. A wideband and high gain cavity-backed 4 × 4 patch antenna array is proposed in this paper. Each patch antenna element of the array is enclosed by a rectangular cavity and differentially-fed by the slot underneath. By optimizing the geometry of the radiating patch and the cavity, a very uniform E-field distribution at the antenna aperture is achieved, leading to the high array aperture efficiency and thus the gain. Taking advantages of the higher-order substrate integrated cavity excitation, the elements of the array are efficiently fed with the same amplitude and phase in a simplified feeding mechanism instead of the conventional bulky and lossy power-splitter-based feeding network. Measured results show the antenna bandwidth is from 56 to 63.1-GHz (16.1%) with the peak gain reaching 21.4 dBi. The radiation patterns of the array are very stable over the entire frequency band and the cross-polarizations are as low as -30 dB. These good characteristics demonstrate that the proposed array can be a good candidate for the future 60-GHz communication system applications

    Quasiparticle States around a Nonmagnetic Impurity in D-Density-Wave State of High-TcT_c Cuprates

    Full text link
    Recently Chakravarty {\em et al.} proposed an ordered dd-density wave (DDW) state as an explanation of the pseudogap phase in underdoped high-temperature cuprates. We study the competition between the DDW and superconducting ordering based on an effective mean-field Hamiltonian. We are mainly concerned with the effect of the DDW ordering on the electronic state around a single nonmagnetic impurity. We find that a single subgap resonance peak appears in the local density of state around the impurity. In the unitary limit, the position of this resonance peak is always located at Er=−μE_r=-\mu with respect to the Fermi energy. This result is dramatically different from the case of the pure superconducting state for which the impurity resonant energy is approximately pinned at the Fermi level. This can be used to probe the existence of the DDW ordering in cuprates.Comment: 4 pages, 4 figure
    • …
    corecore