7,968 research outputs found
On time, frequency, and polar motion Quarterly reports, 1 Jan. - 30 Jun. 1969
Sudden changes in earth rotational acceleration and polar secular motio
Clipped-Objective Policy Gradients for Pessimistic Policy Optimization
To facilitate efficient learning, policy gradient approaches to deep
reinforcement learning (RL) are typically paired with variance reduction
measures and strategies for making large but safe policy changes based on a
batch of experiences. Natural policy gradient methods, including Trust Region
Policy Optimization (TRPO), seek to produce monotonic improvement through
bounded changes in policy outputs. Proximal Policy Optimization (PPO) is a
commonly used, first-order algorithm that instead uses loss clipping to take
multiple safe optimization steps per batch of data, replacing the bound on the
single step of TRPO with regularization on multiple steps. In this work, we
find that the performance of PPO, when applied to continuous action spaces, may
be consistently improved through a simple change in objective. Instead of the
importance sampling objective of PPO, we instead recommend a basic policy
gradient, clipped in an equivalent fashion. While both objectives produce
biased gradient estimates with respect to the RL objective, they also both
display significantly reduced variance compared to the unbiased off-policy
policy gradient. Additionally, we show that (1) the clipped-objective policy
gradient (COPG) objective is on average "pessimistic" compared to both the PPO
objective and (2) this pessimism promotes enhanced exploration. As a result, we
empirically observe that COPG produces improved learning compared to PPO in
single-task, constrained, and multi-task learning, without adding significant
computational cost or complexity. Compared to TRPO, the COPG approach is seen
to offer comparable or superior performance, while retaining the simplicity of
a first-order method.Comment: 12 pages, 8 figure
Recommended from our members
SDT: A Database Schema Design and Translation Tool Reference Manual Draft 4.1
X-ray vs. Optical Variations in the Seyfert 1 Nucleus NGC 3516: A Puzzling Disconnectedness
We present optical broadband (B and R) observations of the Seyfert 1 nucleus
NGC 3516, obtained at Wise Observatory from March 1997 to March 2002,
contemporaneously with X-ray 2-10 keV measurements with RXTE. With these data
we increase the temporal baseline of this dataset to 5 years, more than triple
to the coverage we have previously presented for this object. Analysis of the
new data does not confirm the 100-day lag of X-ray behind optical variations,
tentatively reported in our previous work. Indeed, excluding the first year's
data, which drive the previous result, there is no significant correlation at
any lag between the X-ray and optical bands. We also find no correlation at any
lag between optical flux and various X-ray hardness ratios. We conclude that
the close relation observed between the bands during the first year of our
program was either a fluke, or perhaps the result of the exceptionally bright
state of NGC 3516 in 1997, to which it has yet to return. Reviewing the results
of published joint X-ray and UV/optical Seyfert monitoring programs, we
speculate that there are at least two components or mechanisms contributing to
the X-ray continuum emission up to 10 keV: a soft component that is correlated
with UV/optical variations on timescales >1 day, and whose presence can be
detected when the source is observed at low enough energies (about 1 keV), is
unabsorbed, or is in a sufficiently bright phase; and a hard component whose
variations are uncorrelated with the UV/optical.Comment: 9 pages, AJ, in pres
Multiscaled Cross-Correlation Dynamics in Financial Time-Series
The cross correlation matrix between equities comprises multiple interactions
between traders with varying strategies and time horizons. In this paper, we
use the Maximum Overlap Discrete Wavelet Transform to calculate correlation
matrices over different timescales and then explore the eigenvalue spectrum
over sliding time windows. The dynamics of the eigenvalue spectrum at different
times and scales provides insight into the interactions between the numerous
constituents involved.
Eigenvalue dynamics are examined for both medium and high-frequency equity
returns, with the associated correlation structure shown to be dependent on
both time and scale. Additionally, the Epps effect is established using this
multivariate method and analyzed at longer scales than previously studied. A
partition of the eigenvalue time-series demonstrates, at very short scales, the
emergence of negative returns when the largest eigenvalue is greatest. Finally,
a portfolio optimization shows the importance of timescale information in the
context of risk management
Structural Levels of Mental Illness Stigma and Discrimination
Most of the models that currently describe processes related to mental illness stigma are based on individual-level psychological paradigms. In this article, using a sociological paradigm, we apply the concepts of structural discrimination to broaden our understanding of stigmatizing processes directed at people with mental illness. Structural, or institutional, discrimination includes the policies of private and governmental institutions that intentionally restrict the opportunities of people with mental illness. It also includes major institutions' policies that are not intended to discriminate but whose consequences nevertheless hinder the options of people with mental illness. After more fully defining intentional and unintentional forms of structural discrimination, we provide current examples of each. Then we discuss the implications of structural models for advancing our understanding of mental illness stigma, including the methodological challenges posed by this paradigm
Study of stability and control moment gyro wobble damping of flexible, spinning space stations
An executive summary and an analysis of the results are discussed. A user's guide for the digital computer program that simulates the flexible, spinning space station is presented. Control analysis activities and derivation of dynamic equations of motion and the modal analysis are also cited
A Suzaku, NuSTAR, and XMM-Newton view on variable absorption and relativistic reflection in NGC 4151
We disentangle X-ray disk reflection from complex line-of-sight absorption in
the nearby Seyfert NGC 4151, using a suite of Suzaku, NuSTAR, and XMM-Newton
observations. Extending upon earlier published work, we pursue a physically
motivated model using the latest angle-resolved version of the lamp-post
geometry reflection model relxillCp_lp together with a Comptonization
continuum. We use the long-look simultaneous Suzaku/NuSTAR observation to
develop a baseline model wherein we model reflected emission as a combination
of lamp-post components at the heights of 1.2 and 15.0 gravitational radii. We
argue for a vertically extended corona as opposed to two compact and distinct
primary sources. We find two neutral absorbers (one full-covering and one
partial-covering), an ionized absorber (), and a highly-ionized
ultra-fast outflow, which have all been reported previously. All analyzed
spectra are well described by this baseline model. The bulk of the spectral
variability between 1 keV and 6 keV can be accounted for by changes in the
column density of both neutral absorbers, which appear to be degenerate and
inversely correlated with the variable hard continuum component flux. We track
variability in absorption on both short (2 d) and long (1 yr) timescales;
the observed evolution is either consistent with changes in the absorber
structure (clumpy absorber at distances ranging from the broad line region
(BLR) to the inner torus or a dusty radiatively driven wind) or a geometrically
stable neutral absorber that becomes increasingly ionized at a rising flux
level. The soft X-rays below 1 keV are dominated by photoionized emission from
extended gas that may act as a warm mirror for the nuclear radiation.Comment: 21 pages, 19 figures, 8 tables, accepted for publication by A&
Managing Risk of Bidding in Display Advertising
In this paper, we deal with the uncertainty of bidding for display
advertising. Similar to the financial market trading, real-time bidding (RTB)
based display advertising employs an auction mechanism to automate the
impression level media buying; and running a campaign is no different than an
investment of acquiring new customers in return for obtaining additional
converted sales. Thus, how to optimally bid on an ad impression to drive the
profit and return-on-investment becomes essential. However, the large
randomness of the user behaviors and the cost uncertainty caused by the auction
competition may result in a significant risk from the campaign performance
estimation. In this paper, we explicitly model the uncertainty of user
click-through rate estimation and auction competition to capture the risk. We
borrow an idea from finance and derive the value at risk for each ad display
opportunity. Our formulation results in two risk-aware bidding strategies that
penalize risky ad impressions and focus more on the ones with higher expected
return and lower risk. The empirical study on real-world data demonstrates the
effectiveness of our proposed risk-aware bidding strategies: yielding profit
gains of 15.4% in offline experiments and up to 17.5% in an online A/B test on
a commercial RTB platform over the widely applied bidding strategies
- …