265 research outputs found
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Stochastic modelling and statistical inference for electricity prices, wind energy production and wind speed
Although wind energy helps us slow down the increase of global temperatures,
its weather-dependence and unpredictability make it risky to invest in. In this
thesis we apply statistical and mathematical tools to enable energy providers
to accurately plan such investments.
In the first part we want to understand the impact of wind energy on electricity
prices. We extend an existing multifactor model of electricity spot prices by
including stochastic volatility as well as the information about wind energy
production. Empirical studies indicate that these additions improve the
model fit. We also model wind-related variables directly, using Brownian
semistationary processes with generalised hyperbolic marginals. Finally, we
introduce a joint model of prices and wind energy production suitable for
quantifying the risk faced by energy distributors.
The second goal is to produce accurate short-term wind speed forecasts based
on historical data instead of computationally expensive physical models. We
achieve this by splitting the wind speed into two horizontal components
and modelling them with Brownian semistationary processes with a novel
triple-scale kernel. We develop efficient estimation and forecasting procedures.
Empirical studies show that such modelling choices result in good forecasting
performance.Open Acces
Fractional Calculus and the Future of Science
Newton foresaw the limitations of geometryâs description of planetary behavior and developed fluxions (differentials) as the new language for celestial mechanics and as the way to implement his laws of mechanics. Two hundred years later Mandelbrot introduced the notion of fractals into the scientific lexicon of geometry, dynamics, and statistics and in so doing suggested ways to see beyond the limitations of Newtonâs laws. Mandelbrotâs mathematical essays suggest how fractals may lead to the understanding of turbulence, viscoelasticity, and ultimately to end of dominance of the Newtonâs macroscopic world view.Fractional Calculus and the Future of Science examines the nexus of these two game-changing contributions to our scientific understanding of the world. It addresses how non-integer differential equations replace Newtonâs laws to describe the many guises of complexity, most of which lay beyond Newtonâs experience, and many had even eluded Mandelbrotâs powerful intuition. The bookâs authors look behind the mathematics and examine what must be true about a phenomenonâs behavior to justify the replacement of an integer-order with a noninteger-order (fractional) derivative. This window into the future of specific science disciplines using the fractional calculus lens suggests how what is seen entails a difference in scientific thinking and understanding
Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain
466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts
Bayesian Spatio-Temporal Modeling for Forecasting, Trend Assessment and Spatial Trend Filtering
This work develops Bayesian spatio-temporal modeling techniques specifically aimed at studying several aspects of our motivating applications, to include vector-borne disease incidence and air pollution levels. A key attribute of the proposed techniques are that they are scalable to extremely large data sets which consist of spatio-temporally oriented observations. The scalability of our modeling strategies is accomplished in two primary ways. First, through the introduction of carefully constructed latent random variables we are able to develop Markov chain Monte Carlo (MCMC) sampling algorithms that consist primarily of Gibbs steps. This leads to the fast and easy updating of the model parameters from common distributions. Second, for the spatio-temporal aspects of the models, a novel sampling strategy for Gaussian Markov random fields (GRMFs) that can be easily implemented (in parallel) within MCMC sampling algorithms is used. The performance of the proposed modeling strategies are demonstrated through extensive numerical studies and are further used to analyze vector-borne disease data measured on canines throughout the conterminous United States and PM 2.5 levels measured at weather stations throughout the Eastern United States.
In particular, we begin by developing a Poisson regression model that can be used to forecast the incidence of vector-borne disease throughout a large geographic area. The proposed model accounts for spatio-temporal dependence through a vector autoregression and is fit through a Metropolis-Hastings based Markov chain Monte Carlo (MCMC) sampling algorithm. The model is used to forecast the prevalence of Lyme disease (Chapter 2) and Anaplasmosis (Chapter 3) in canines throughout the United States. As a part of these studies we also evaluate the significance of various climatic and socio-economic drivers of disease. We then present (Chapter 4) the development of the \u27chromatic sampler\u27 for GMRFs. The chromatic sampler is an MCMC sampling technique that exploits the Markov property of GMRFs to sample large groups of parameters in parallel. A greedy algorithm for finding such groups of parameters is presented. The methodology is found to be superior, in terms of computational effort, to both full block and single-site updating. For assessing spatio-temporal trends, we develop (Chapter 5) a binomial regression model with spatially varying coefficients. This model uses Gaussian predictive processes to estimate spatially varying coefficients and a conditional autoregressive structure embedded in a vector autoregression to account for spatio-temporal dependence in the data. The methodology is capable of estimating both widespread regional and small scale local trends. A data augmentation strategy is used to develop a Gibbs based MCMC sampling routine. The approach is made computationally feasible through adopting the chromatic sampler for GMRFs to sample the spatio-temporal random effects. The model is applied to a dataset consisting of 16 million test results for antibodies to Borrelia burgdoferi and used to identify several areas of the United States experiencing increasing Lyme disease risk. For nonparametric functional estimation, we develop (Chapter 6) a Bayesian multidimensional trend filter (BMTF). The BMTF is a flexible nonparameteric estimator that extends traditional one dimensional trend filtering methods to multiple dimensions. The methodology is computationally scalable to a large support space and the expense of fitting the model is nearly independent of the number of observations. The methodology involves discretizing the support space and estimating a multidimensional step function over the discretized support. Two adaptive methods of discretization which allows the data to determine the resolution of the resulting function is presented. The BMTF is then used (Chapter 7) to allow for spatially varying coefficients within a quantile regression model. A data augmentation strategy is introduced which facilitates the development of a Gibbs based MCMC sampling routine. This methodology is developed to study various meteorological drivers of high levels of PM 2.5, a particularly hazardous form of air pollution consisting of particles less than 2.5 micrometers in diameter
Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain
466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts
Seventh International Workshop on Simulation, 21-25 May, 2013, Department of Statistical Sciences, Unit of Rimini, University of Bologna, Italy. Book of Abstracts
Seventh International Workshop on Simulation, 21-25 May, 2013, Department of Statistical Sciences, Unit of Rimini, University of Bologna, Italy. Book of Abstract
Engineering Education and Research Using MATLAB
MATLAB is a software package used primarily in the field of engineering for signal processing, numerical data analysis, modeling, programming, simulation, and computer graphic visualization. In the last few years, it has become widely accepted as an efficient tool, and, therefore, its use has significantly increased in scientific communities and academic institutions. This book consists of 20 chapters presenting research works using MATLAB tools. Chapters include techniques for programming and developing Graphical User Interfaces (GUIs), dynamic systems, electric machines, signal and image processing, power electronics, mixed signal circuits, genetic programming, digital watermarking, control systems, time-series regression modeling, and artificial neural networks
- âŠ