11,258 research outputs found
Learning policies for Markov decision processes from data
We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example
Learning policies for Markov decision processes from data
We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example
A Game of Attribute Decomposition for Software Architecture Design
Attribute-driven software architecture design aims to provide decision
support by taking into account the quality attributes of softwares. A central
question in this process is: What architecture design best fulfills the
desirable software requirements? To answer this question, a system designer
needs to make tradeoffs among several potentially conflicting quality
attributes. Such decisions are normally ad-hoc and rely heavily on experiences.
We propose a mathematical approach to tackle this problem. Game theory
naturally provides the basic language: Players represent requirements, and
strategies involve setting up coalitions among the players. In this way we
propose a novel model, called decomposition game, for attribute-driven design.
We present its solution concept based on the notion of cohesion and
expansion-freedom and prove that a solution always exists. We then investigate
the computational complexity of obtaining a solution. The game model and the
algorithms may serve as a general framework for providing useful guidance for
software architecture design. We present our results through running examples
and a case study on a real-life software project.Comment: 23 pages, 5 figures, a shorter version to appear at 12th
International Colloquium on Theoretical Aspects of Computing (ICTAC 2015
An ecological analysis of secondary school students' drug use in Hong Kong: A case-control study
Background: Youth drug use is a significant at-risk youth behaviour and remains as one of the top priorities for mental health services, researchers and policy planners. The ecological characteristics of secondary school students’ behaviour in Hong Kong are understudied.
Aim: To examine individual, familial, social and environmental correlates of drug use among secondary students in Hong Kong.
Method: Data were extracted from a school survey with 3078 students. Among the 3078 students, 86 students reported to have used drugs in the past 6 months. A total of 86 age- and gender-matched controls with no drug-use behaviour in the past 6 months were randomly selected from the remaining students. Multiple logistic analysis was used to examine differential correlates between those who used and did not use substance in the past 6 months.
Result: Positive school experience and perspective to school and parental support are protective factors of drug use. Lower self-esteem, lower self-efficacy against using drugs and higher level of permissive attitude towards drugs were associated with drug use. Students who were low in self-esteem and rather impulsive tend to use drugs.
Conclusion: To prevent students from drug use, efforts in individual, family, school and community-levels should be addressed.postprin
Treatment time for non-surgical endodontic therapy with or without a magnifying loupe
published_or_final_versio
Mm-wave high gain cavity-backed aperture-coupled patch antenna array
© 2013 IEEE. A wideband and high gain cavity-backed 4 × 4 patch antenna array is proposed in this paper. Each patch antenna element of the array is enclosed by a rectangular cavity and differentially-fed by the slot underneath. By optimizing the geometry of the radiating patch and the cavity, a very uniform E-field distribution at the antenna aperture is achieved, leading to the high array aperture efficiency and thus the gain. Taking advantages of the higher-order substrate integrated cavity excitation, the elements of the array are efficiently fed with the same amplitude and phase in a simplified feeding mechanism instead of the conventional bulky and lossy power-splitter-based feeding network. Measured results show the antenna bandwidth is from 56 to 63.1-GHz (16.1%) with the peak gain reaching 21.4 dBi. The radiation patterns of the array are very stable over the entire frequency band and the cross-polarizations are as low as -30 dB. These good characteristics demonstrate that the proposed array can be a good candidate for the future 60-GHz communication system applications
Quasiparticle States around a Nonmagnetic Impurity in D-Density-Wave State of High- Cuprates
Recently Chakravarty {\em et al.} proposed an ordered -density wave (DDW)
state as an explanation of the pseudogap phase in underdoped high-temperature
cuprates. We study the competition between the DDW and superconducting ordering
based on an effective mean-field Hamiltonian. We are mainly concerned with the
effect of the DDW ordering on the electronic state around a single nonmagnetic
impurity. We find that a single subgap resonance peak appears in the local
density of state around the impurity. In the unitary limit, the position of
this resonance peak is always located at with respect to the Fermi
energy. This result is dramatically different from the case of the pure
superconducting state for which the impurity resonant energy is approximately
pinned at the Fermi level. This can be used to probe the existence of the DDW
ordering in cuprates.Comment: 4 pages, 4 figure
Structural Diversity of Class 1 integrons and their associated gene cassettes in Klebsiella pneumoniae isolates from a hospital in China
published_or_final_versio
- …