20,039 research outputs found

    Interpretable statistics for complex modelling: quantile and topological learning

    Get PDF
    As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

    Empirical Study of Intraday Option Price Changes using extended Count Regression Models

    Get PDF
    In this paper we model absolute price changes of an option on the XETRA DAX index based on quote-by-quote data from the EUREX exchange. In contrast to other authors, we focus on a parameter-driven model for this purpose and use a Poisson Generalized Linear Model (GLM) with a latent AR(1) process in the mean, which accounts for autocorrelation and overdispersion in the data. Parameter estimation is carried out by Markov Chain Monte Carlo methods using the WinBUGS software. In a Bayesian context, we prove the superiority of this modelling approach compared to an ordinary Poisson-GLM and to a complex Poisson-GLM with heterogeneous variance structure (but without taking into account any autocorrelations) by using the deviance information criterion (DIC) as proposed by Spiegelhalter et al. (2002). We include a broad range of explanatory variables into our regression modelling for which we also consider interaction effects: While, according to our modelling results, the price development of the underlying, the intrinsic value of the option at the time of the trade, the number of new quotations between two price changes, the time between two price changes and the Bid-Ask spread have significant effects on the size of the price changes, this is not the case for the remaining time to maturity of the option. By giving possible interpretations of our modelling results we also provide an empirical contribution to the understanding of the microstructure of option markets

    Detecting abrupt changes in the spectra of high-energy astrophysical sources

    Get PDF
    Variable-intensity astronomical sources are the result of complex and often extreme physical processes. Abrupt changes in source intensity are typically accompanied by equally sudden spectral shifts, that is, sudden changes in the wavelength distribution of the emission. This article develops a method for modeling photon counts collected from observation of such sources. We embed change points into a marked Poisson process, where photon wavelengths are regarded as marks and both the Poisson intensity parameter and the distribution of the marks are allowed to change. To the best of our knowledge, this is the first effort to embed change points into a marked Poisson process. Between the change points, the spectrum is modeled nonparametrically using a mixture of a smooth radial basis expansion and a number of local deviations from the smooth term representing spectral emission lines. Because the model is over-parameterized, we employ an ℓ1ℓ1 penalty. The tuning parameter in the penalty and the number of change points are determined via the minimum description length principle. Our method is validated via a series of simulation studies and its practical utility is illustrated in the analysis of the ultra-fast rotating yellow giant star known as FK Com

    The Overlooked Potential of Generalized Linear Models in Astronomy - I: Binomial Regression

    Get PDF
    Revealing hidden patterns in astronomical data is often the path to fundamental scientific breakthroughs; meanwhile the complexity of scientific inquiry increases as more subtle relationships are sought. Contemporary data analysis problems often elude the capabilities of classical statistical techniques, suggesting the use of cutting edge statistical methods. In this light, astronomers have overlooked a whole family of statistical techniques for exploratory data analysis and robust regression, the so-called Generalized Linear Models (GLMs). In this paper -- the first in a series aimed at illustrating the power of these methods in astronomical applications -- we elucidate the potential of a particular class of GLMs for handling binary/binomial data, the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we present the use of these GLMs to explore the conditions of star formation activity and metal enrichment in primordial minihaloes from cosmological hydro-simulations including detailed chemistry, gas physics, and stellar feedback. We predict that for a dark mini-halo with metallicity 1.3×104Z\approx 1.3 \times 10^{-4} Z_{\bigodot}, an increase of 1.2×1021.2 \times 10^{-2} in the gas molecular fraction, increases the probability of star formation occurrence by a factor of 75%. Finally, we highlight the use of receiver operating characteristic curves as a diagnostic for binary classifiers, and ultimately we use these to demonstrate the competitive predictive performance of GLMs against the popular technique of artificial neural networks.Comment: 20 pages, 10 figures, 3 tables, accepted for publication in Astronomy and Computin
    corecore