Search CORE

36,059 research outputs found

Taming outliers in pulsar-timing datasets with hierarchical likelihoods and Hamiltonian sampling

Author: Vallisneri Michele
van Haasteren Rutger
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/09/2016
Field of study

Pulsar-timing datasets have been analyzed with great success using probabilistic treatments based on Gaussian distributions, with applications ranging from studies of neutron-star structure to tests of general relativity and searches for nanosecond gravitational waves. As for other applications of Gaussian distributions, outliers in timing measurements pose a significant challenge to statistical inference, since they can bias the estimation of timing and noise parameters, and affect reported parameter uncertainties. We describe and demonstrate a practical end-to-end approach to perform Bayesian inference of timing and noise parameters robustly in the presence of outliers, and to identify these probabilistically. The method is fully consistent (i.e., outlier-ness probabilities vary in tune with the posterior distributions of the timing and noise parameters), and it relies on the efficient sampling of the hierarchical form of the pulsar-timing likelihood. Such sampling has recently become possible with a "no-U-turn" Hamiltonian sampler coupled to a highly customized reparametrization of the likelihood; this code is described elsewhere, but it is already available online. We recommend our method as a standard step in the preparation of pulsar-timing-array datasets: even if statistical inference is not affected, follow-up studies of outlier candidates can reveal unseen problems in radio observations and timing measurements; furthermore, confidence in the results of gravitational-wave searches will only benefit from stringent statistical evidence that datasets are clean and outlier-free.Comment: 6 pages, 2 figures, RevTeX 4.

arXiv.org e-Print Archive

Caltech Authors

Measuring Loss Potential of Hedge Fund Strategies

Author: Achim Peijan
Marcos Mailoc López de Prado
Publication venue
Publication date
Field of study

We measure the loss potential of Hedge Funds by combining three market risk measures: VaR, Draw-Down and Time Under-The-Water. Calculations are carried out considering three different frameworks regarding Hedge Fund returns: i) Normality and time-independence, ii) Non-normality and time- independence and iii) Non-normality and time-dependence. In the case of Hedge Funds, our results clearly state that market risk may be substantially underestimated by those models which assume Normality or, even considering Non-Normality, neglect to model time- dependence. Moreover, VaR is an incomplete measure of market risk whenever the Normality assumption does not hold. In this case, VaR results must be compared with Draw-Down and Time Under-The-Water measures in order to accurately assess about Hedge Funds loss potential.Hedge Fund, Value-at-Risk, risk, performance, drawdown, under- the-water, normal returns, non-normal returns, time-dependence, ARMA, Monte Carlo, skewness, kurtosis, mixture of gaussian distributions, survival probability, styles, investment strategies

Research Papers in Economics

Application of the Gaussian mixture model in pulsar astronomy -- pulsar classification and candidates ranking for {\it Fermi} 2FGL catalog

Author: Abdo
Ackermann
Bhattacharya
D. J. Champion
Eatough
Espinoza
Espinoza
Fasano
K. J. Lee
Kassam
L. Guillemot
Lin
Lyne
M. Kramer
Manchester
Nolan
Peacock
Press
Shklovskii
Tauris
Theodoridis
Weltevrede
Y. L. Yue
Publication venue: 'Wiley'
Publication date: 28/05/2012
Field of study

Machine learning, algorithms to extract empirical knowledge from data, can be used to classify data, which is one of the most common tasks in observational astronomy. In this paper, we focus on Bayesian data classification algorithms using the Gaussian mixture model and show two applications in pulsar astronomy. After reviewing the Gaussian mixture model and the related Expectation-Maximization algorithm, we present a data classification method using the Neyman-Pearson test. To demonstrate the method, we apply the algorithm to two classification problems. Firstly, it is applied to the well known period-period derivative diagram, where we find that the pulsar distribution can be modeled with six Gaussian clusters, with two clusters for millisecond pulsars (recycled pulsars) and the rest for normal pulsars. From this distribution, we derive an empirical definition for millisecond pulsars as

\frac{\dot{P}}{10^{-17}} \leq3.23(\frac{P}{100 \textrm{ms}})^{-2.34}

. The two millisecond pulsar clusters may have different evolutionary origins, since the companion stars to these pulsars in the two clusters show different chemical composition. Four clusters are found for normal pulsars. Possible implications for these clusters are also discussed. Our second example is to calculate the likelihood of unidentified \textit{Fermi} point sources being pulsars and rank them accordingly. In the ranked point source list, the top 5% sources contain 50% known pulsars, the top 50% contain 99% known pulsars, and no known active galaxy (the other major population) appears in the top 6%. Such a ranked list can be used to help the future follow-up observations for finding pulsars in unidentified \textit{Fermi} point sources.Comment: 9 pages, 4 figures, accepted by MNRA

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Latent Self-Exciting Point Process Model for Spatial-Temporal Networks

Author: Brantingham P. Jeffrey
Cho Yoon-Sik
Galstyan Aram
Tita George
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/01/2014
Field of study

We propose a latent self-exciting point process model that describes geographically distributed interactions between pairs of entities. In contrast to most existing approaches that assume fully observable interactions, here we consider a scenario where certain interaction events lack information about participants. Instead, this information needs to be inferred from the available observations. We develop an efficient approximate algorithm based on variational expectation-maximization to infer unknown participants in an event given the location and the time of the event. We validate the model on synthetic as well as real-world data, and obtain very promising results on the identity-inference task. We also use our model to predict the timing and participants of future events, and demonstrate that it compares favorably with baseline approaches.Comment: 20 pages, 6 figures (v3); 11 pages, 6 figures (v2); previous version appeared in the 9th Bayesian Modeling Applications Workshop, UAI'1

arXiv.org e-Print Archive

eScholarship - University of California

Modeling Interdependent and Periodic Real-World Action Sequences

Author: Althoff Tim
Kurashima Takeshi
Leskovec Jure
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million actions taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, our model improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions.Comment: Accepted at WWW 201

arXiv.org e-Print Archive

Crossref