2,768 research outputs found
Learning policies for Markov decision processes from data
We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example
Learning policies for Markov decision processes from data
We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example
Recommended from our members
Operando STM study of the interaction of imidazolium-based ionic liquid with graphite
Understanding interactions at the interfaces of carbon with ionic liquids (ILs) is crucially beneficial for the diagnostics and performance improvement of electrochemical devices containing carbon as active materials or conductive additives in electrodes and ILs as solvents or additives in electrolytes. The interfacial interactions of three typical imidazolium-based ILs, 1-alkyl-3-methylimidazolium bis(trifluoromethanesulfonyl)imide (AMImTFSI) ILs having ethyl (C2), butyl (C4) and octyl (C8) chains in their cations, with highly oriented pyrolytic graphite (HOPG) were studied in-situ by electrochemical scanning tunneling microscopy (EC-STM). The etching of HOPG surface and the exfoliation of graphite/graphene flakes as well as cation intercalation were observed at the HOPG/C2MImTFSI interface. The etching also takes place in C4MImTFSI at −1.5 V vs Pt but only at step edges with a much slower rate, whereas C8MIm+ cations adsorbs strongly on the HOPG surface under similar conditions with no observable etching or intercalation. The EC-STM observations can be explained by the increase in van der Waals interaction between the cations and the graphite surface with increasing length of alkyl chains
Chinese social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks
BACKGROUND: As internet and social media use have skyrocketed, epidemiologists have begun to use online data such as Google query data and Twitter trends to track the activity levels of influenza and other infectious diseases. In China, Weibo is an extremely popular microblogging site that is equivalent to Twitter. Capitalizing on the wealth of public opinion data contained in posts on Weibo, this study used Weibo as a measure of the Chinese people's reactions to two different outbreaks: the 2012 Middle East Respiratory Syndrome Coronavirus (MERS-CoV) outbreak, and the 2013 outbreak of human infection of avian influenza A(H7N9) in China. METHODS: Keyword searches were performed in Weibo data collected by The University of Hong Kong's Weiboscope project. Baseline values were determined for each keyword and reaction values per million posts in the days after outbreak information was released to the public. RESULTS: The results show that the Chinese people reacted significantly to both outbreaks online, where their social media reaction was two orders of magnitude stronger to the H7N9 influenza outbreak that happened in China than the MERS-CoV outbreak that was far away from China. CONCLUSIONS: These results demonstrate that social media could be a useful measure of public awareness and reaction to disease outbreak information released by health authorities.published_or_final_versio
Controlling light-with-light without nonlinearity
According to Huygens' superposition principle, light beams traveling in a
linear medium will pass though one another without mutual disturbance. Indeed,
it is widely held that controlling light signals with light requires intense
laser fields to facilitate beam interactions in nonlinear media, where the
superposition principle can be broken. We demonstrate here that two coherent
beams of light of arbitrarily low intensity can interact on a metamaterial
layer of nanoscale thickness in such a way that one beam modulates the
intensity of the other. We show that the interference of beams can eliminate
the plasmonic Joule losses of light energy in the metamaterial or, in contrast,
can lead to almost total absorbtion of light. Applications of this phenomenon
may lie in ultrafast all-optical pulse-recovery devices, coherence filters and
THz-bandwidth light-by-light modulators
The Search for Other Planets and Life
This Les Houches School offers students a wide ranging view of the field of exoplanets and the search for life beyond the solar system. Observational and theoretical opportunities abound in a new field of astronomy that will be growing for decades to come. I give a brief introduction and overview to the many detailed talks that will be presented in this volume
Dopamine Responsiveness in the Nucl. Accumbens Shell and Parameters of the Heroin-Influenced Conditioned Place Preference in Rats
Previous evidence demonstrated that drug-induced extracellular dopamine (DA) concentrations
in the nucl. accumbens shell (AcbSh) might underlie different vulnerabilities to heroin
addiction in inbred mice strains. We investigated a potential role of the responsiveness of the
DA system in the AcbSh with respect to the vulnerability to heroin-influenced conditioned
place preference (CPP) in rats. Animals were randomly assigned to the heroin and saline
(control) groups. Heroin-group rats were then re-classified into two groups according to the
degree of heroin-induced CPP, high preference (HP) and low-preference (LP) ones. The levels
of extracellular DA and dihydroxyphenyl acetic acid (DOPAC) were estimated dynamically
by in vivo microdialysis. Compared with the saline group, extracellular DA and DOPAC
concentrations in the heroin-treated groups were significantly higher 30 min after the last
injection, but the DA level decreased sharply in these groups on days 1 and 3 and became
lower than that of the saline group. Compared with LP-group rats, HP-rats displayed a higher
heroin-induced increase in the DA concentration 30 min after the last heroin injection and
higher DOPAC and DOPAC/DA ratios 14 days after such injection. These results suggest
that differences in the DA system responsiveness in the AcbSh may determine individual
differences in vulnerability to heroin addiction.Результати попередніх досліджень продемонстрували,
що змінений під впливом фармакологічних агентів
рівень дофаміну (DA) в шкаралупі nucl. accumbens
(AcbSh) є визначальним фактором для вразливості до
героїнової аддукції у лінійних мишей. Ми досліджували
можливу роль реактивності DA-ергічної системи AcbSh
у вразливості умовнорефлекторної преференції місця
(УРПМ) щодо героїну у щурів лінї Спрейг–Доулі.
Щури були рандомізовано поділені на «героїнову» та
контрольну групи. Щури першої з них потім додатково
поділи на дві групи відповідно до інтенсивності змін
УРПМ під впливом героїну – тварин з високою та низькою
«героїновою» преференцією (HP та LP). Рівні DA та
дигідроксифенілоцтової кислоти (DOPAC) у позаклітинному
просторі AcbSh оцінювали в динаміці за допомогою
мікродіалізу in vivo. Позаклітинні концентрації DА та
DOPAC у «героїнових» групах через 30 хв після останньої
ін’єкції були істотно вищими, ніж у контролі, але рівень DA
у тварин цих груп швидко знижувався і на першу та третю
добу ставав нижчим порівняно з контролем. Тварини групи
HP порівняно зі щурами групи LP демонстрували вищі
значення індукованого героїном збільшення концентрації
DA через 30 хв після останньої ін’єкції героїну та вищі
рівень DOPAC і відношення DOPAC/DA через 14 діб після
такої ін’єкції. Подібні результати дозволяють вважати, що
різниці в реактивності DA-ергічної системи в AcbSh можуть
визначати індивідуальні відмінності вразливості щодо
героїнової залежності
Nanocomposites of polymer and inorganic nanoparticles for optical and magnetic applications
This article provides an up-to-date review on nanocomposites composed of inorganic nanoparticles and the polymer matrix for optical and magnetic applications. Optical or magnetic characteristics can change upon the decrease of particle sizes to very small dimensions, which are, in general, of major interest in the area of nanocomposite materials. The use of inorganic nanoparticles into the polymer matrix can provide high-performance novel materials that find applications in many industrial fields. With this respect, frequently considered features are optical properties such as light absorption (UV and color), and the extent of light scattering or, in the case of metal particles, photoluminescence, dichroism, and so on, and magnetic properties such as superparamagnetism, electromagnetic wave absorption, and electromagnetic interference shielding. A general introduction, definition, and historical development of polymer–inorganic nanocomposites as well as a comprehensive review of synthetic techniques for polymer–inorganic nanocomposites will be given. Future possibilities for the development of nanocomposites for optical and magnetic applications are also introduced. It is expected that the use of new functional inorganic nano-fillers will lead to new polymer–inorganic nanocomposites with unique combinations of material properties. By careful selection of synthetic techniques and understanding/exploiting the unique physics of the polymeric nanocomposites in such materials, novel functional polymer–inorganic nanocomposites can be designed and fabricated for new interesting applications such as optoelectronic and magneto-optic applications
- …