36,059 research outputs found
Taming outliers in pulsar-timing datasets with hierarchical likelihoods and Hamiltonian sampling
Pulsar-timing datasets have been analyzed with great success using
probabilistic treatments based on Gaussian distributions, with applications
ranging from studies of neutron-star structure to tests of general relativity
and searches for nanosecond gravitational waves. As for other applications of
Gaussian distributions, outliers in timing measurements pose a significant
challenge to statistical inference, since they can bias the estimation of
timing and noise parameters, and affect reported parameter uncertainties. We
describe and demonstrate a practical end-to-end approach to perform Bayesian
inference of timing and noise parameters robustly in the presence of outliers,
and to identify these probabilistically. The method is fully consistent (i.e.,
outlier-ness probabilities vary in tune with the posterior distributions of the
timing and noise parameters), and it relies on the efficient sampling of the
hierarchical form of the pulsar-timing likelihood. Such sampling has recently
become possible with a "no-U-turn" Hamiltonian sampler coupled to a highly
customized reparametrization of the likelihood; this code is described
elsewhere, but it is already available online. We recommend our method as a
standard step in the preparation of pulsar-timing-array datasets: even if
statistical inference is not affected, follow-up studies of outlier candidates
can reveal unseen problems in radio observations and timing measurements;
furthermore, confidence in the results of gravitational-wave searches will only
benefit from stringent statistical evidence that datasets are clean and
outlier-free.Comment: 6 pages, 2 figures, RevTeX 4.
Measuring Loss Potential of Hedge Fund Strategies
We measure the loss potential of Hedge Funds by combining three market risk measures: VaR, Draw-Down and Time Under-The-Water. Calculations are carried out considering three different frameworks regarding Hedge Fund returns: i) Normality and time-independence, ii) Non-normality and time- independence and iii) Non-normality and time-dependence. In the case of Hedge Funds, our results clearly state that market risk may be substantially underestimated by those models which assume Normality or, even considering Non-Normality, neglect to model time- dependence. Moreover, VaR is an incomplete measure of market risk whenever the Normality assumption does not hold. In this case, VaR results must be compared with Draw-Down and Time Under-The-Water measures in order to accurately assess about Hedge Funds loss potential.Hedge Fund, Value-at-Risk, risk, performance, drawdown, under- the-water, normal returns, non-normal returns, time-dependence, ARMA, Monte Carlo, skewness, kurtosis, mixture of gaussian distributions, survival probability, styles, investment strategies
Application of the Gaussian mixture model in pulsar astronomy -- pulsar classification and candidates ranking for {\it Fermi} 2FGL catalog
Machine learning, algorithms to extract empirical knowledge from data, can be
used to classify data, which is one of the most common tasks in observational
astronomy. In this paper, we focus on Bayesian data classification algorithms
using the Gaussian mixture model and show two applications in pulsar astronomy.
After reviewing the Gaussian mixture model and the related
Expectation-Maximization algorithm, we present a data classification method
using the Neyman-Pearson test. To demonstrate the method, we apply the
algorithm to two classification problems. Firstly, it is applied to the well
known period-period derivative diagram, where we find that the pulsar
distribution can be modeled with six Gaussian clusters, with two clusters for
millisecond pulsars (recycled pulsars) and the rest for normal pulsars. From
this distribution, we derive an empirical definition for millisecond pulsars as
. The two
millisecond pulsar clusters may have different evolutionary origins, since the
companion stars to these pulsars in the two clusters show different chemical
composition. Four clusters are found for normal pulsars. Possible implications
for these clusters are also discussed. Our second example is to calculate the
likelihood of unidentified \textit{Fermi} point sources being pulsars and rank
them accordingly. In the ranked point source list, the top 5% sources contain
50% known pulsars, the top 50% contain 99% known pulsars, and no known active
galaxy (the other major population) appears in the top 6%. Such a ranked list
can be used to help the future follow-up observations for finding pulsars in
unidentified \textit{Fermi} point sources.Comment: 9 pages, 4 figures, accepted by MNRA
Latent Self-Exciting Point Process Model for Spatial-Temporal Networks
We propose a latent self-exciting point process model that describes
geographically distributed interactions between pairs of entities. In contrast
to most existing approaches that assume fully observable interactions, here we
consider a scenario where certain interaction events lack information about
participants. Instead, this information needs to be inferred from the available
observations. We develop an efficient approximate algorithm based on
variational expectation-maximization to infer unknown participants in an event
given the location and the time of the event. We validate the model on
synthetic as well as real-world data, and obtain very promising results on the
identity-inference task. We also use our model to predict the timing and
participants of future events, and demonstrate that it compares favorably with
baseline approaches.Comment: 20 pages, 6 figures (v3); 11 pages, 6 figures (v2); previous version
appeared in the 9th Bayesian Modeling Applications Workshop, UAI'1
Modeling Interdependent and Periodic Real-World Action Sequences
Mobile health applications, including those that track activities such as
exercise, sleep, and diet, are becoming widely used. Accurately predicting
human actions is essential for targeted recommendations that could improve our
health and for personalization of these applications. However, making such
predictions is extremely difficult due to the complexities of human behavior,
which consists of a large number of potential actions that vary over time,
depend on each other, and are periodic. Previous work has not jointly modeled
these dynamics and has largely focused on item consumption patterns instead of
broader types of behaviors such as eating, commuting or exercising. In this
work, we develop a novel statistical model for Time-varying, Interdependent,
and Periodic Action Sequences. Our approach is based on personalized,
multivariate temporal point processes that model time-varying action
propensities through a mixture of Gaussian intensities. Our model captures
short-term and long-term periodic interdependencies between actions through
Hawkes process-based self-excitations. We evaluate our approach on two activity
logging datasets comprising 12 million actions taken by 20 thousand users over
17 months. We demonstrate that our approach allows us to make successful
predictions of future user actions and their timing. Specifically, our model
improves predictions of actions, and their timing, over existing methods across
multiple datasets by up to 156%, and up to 37%, respectively. Performance
improvements are particularly large for relatively rare and periodic actions
such as walking and biking, improving over baselines by up to 256%. This
demonstrates that explicit modeling of dependencies and periodicities in
real-world behavior enables successful predictions of future actions, with
implications for modeling human behavior, app personalization, and targeting of
health interventions.Comment: Accepted at WWW 201
- …