Search CORE

475,893 research outputs found

Large-Scale Kernel Methods for Independence Testing

Author: Filippi Sarah
Gretton Arthur
Sejdinovic Dino
Zhang Qinyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/06/2016
Field of study

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.Comment: 29 pages, 6 figure

arXiv.org e-Print Archive

Springer - Publisher Connector

UCL Discovery

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Resampling-based confidence regions and multiple tests for a correlated random vector

Author: A.W. Vaart Van der
B. Efron
D.M. Mason
J. Præstgaard
J.P. Romano
M. Fromont
P. Hall
P. Hall
Y. Ge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

We derive non-asymptotic confidence regions for the mean of a random vector whose coordinates have an unknown dependence structure. The random vector is supposed to be either Gaussian or to have a symmetric bounded distribution, and we observe

n

i.i.d copies of it. The confidence regions are built using a data-dependent threshold based on a weighted bootstrap procedure. We consider two approaches, the first based on a concentration approach and the second on a direct boostrapped quantile approach. The first one allows to deal with a very large class of resampling weights while our results for the second are restricted to Rademacher weights. However, the second method seems more accurate in practice. Our results are motivated by multiple testing problems, and we show on simulations that our procedures are better than the Bonferroni procedure (union bound) as soon as the observed vector has sufficiently correlated coordinates.Comment: submitted to COL

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Large-Scale Multiple Testing of Composite Null Hypotheses Under Heteroskedasticity

Author: Banerjee Trambak
Gang Bowen
Publication venue
Publication date: 12/06/2023
Field of study

Heteroskedasticity poses several methodological challenges in designing valid and powerful procedures for simultaneous testing of composite null hypotheses. In particular, the conventional practice of standardizing or re-scaling heteroskedastic test statistics in this setting may severely affect the power of the underlying multiple testing procedure. Additionally, when the inferential parameter of interest is correlated with the variance of the test statistic, methods that ignore this dependence may fail to control the type I error at the desired level. We propose a new Heteroskedasticity Adjusted Multiple Testing (HAMT) procedure that avoids data reduction by standardization, and directly incorporates the side information from the variances into the testing procedure. Our approach relies on an improved nonparametric empirical Bayes deconvolution estimator that offers a practical strategy for capturing the dependence between the inferential parameter of interest and the variance of the test statistic. We develop theory to show that HAMT is asymptotically valid and optimal for FDR control. Simulation results demonstrate that HAMT outperforms existing procedures with substantial power gain across many settings at the same FDR level. The method is illustrated on an application involving the detection of engaged users on a mobile game app

arXiv.org e-Print Archive

Specification Testing in Hawkes Models

Author: Franses Ph.H.B.F. (Philip Hans)
Gresnigt F. (Francine)
Kole H.J.W.G. (Erik)
Publication venue: We propose various specification tests for Hawkes models based on the Lagrange Multiplier (LM) principle. Hawkes models can be used to model the occurrence of extreme events in financial markets. Our specific testing focus is on extending a univariate model to a multivariate model, that is, we examine whether there is a conditional dependence between extreme events in markets. Simulations show that the test has good size and power, in particular for sample sizes that are typically encountered in practice. Applying the specification test for dependence to US stocks, bonds and exchange rate data, we find strong evidence for cross-excitation within segments as well as between segments. Therefore, we recommend that univariate Hawkes models be extended to account for the cross-triggering phenomenon.
Publication date: 24/07/2015
Field of study

We propose various specification tests for Hawkes models based on the Lagrange Multiplier (LM) principle. Hawkes models can be used to model the occurrence of extreme events in financial markets. Our specific testing focus is on extending a univariate model to a multivariate model, that is, we examine whether there is a conditional dependence between extreme events in markets. Simulations show that the test has good size and power, in particular for sample sizes that are typically encountered in practice. Applying the specification test for dependence to US stocks, bonds and exchange rate data, we find strong evidence for cross-excitation within segments as well as between segments. Therefore, we recommend that univariate Hawkes models be extended to account for the cross-triggering phenomenon

Erasmus University Digital Repository

Representing time-dependent freezing behaviour in immersion mode ice nucleation

Author: Atkinson J. D.
Dobbie S. J.
Herbert R. J.
Murray B. J.
Whale T. F.
Publication venue: 'Copernicus GmbH'
Publication date: 01/01/2014
Field of study

In order to understand the impact of ice formation in clouds, a quantitative understanding of ice nucleation is required, along with an accurate and efficient representation for use in cloud resolving models. Ice nucleation by atmospherically relevant particle types is complicated by interparticle variability in nucleating ability, as well as a stochastic, time-dependent, nature inherent to nucleation. Here we present a new and computationally efficient Framework for Reconciling Observable Stochastic Time-dependence (FROST) in immersion mode ice nucleation. This framework is underpinned by the finding that the temperature dependence of the nucleation-rate coefficient controls the residence-time and cooling-rate dependence of freezing. It is shown that this framework can be used to reconcile experimental data obtained on different timescales with different experimental systems, and it also provides a simple way of representing the complexities of ice nucleation in cloud resolving models. The routine testing and reporting of time-dependent behaviour in future experimental studies is recommended, along with the practice of presenting normalised data sets following the methods outlined here

Central Archive at the University of Reading

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

White Rose Research Online

Empowering customer engagement by informative billing: a European approach

Author: Cipriano Jordi
Danov Stoyan
Meganck An
Vandevelde Lieven
Publication venue
Publication date: 01/01/2015
Field of study

Programmes aimed at improving end-use energy efficiency are a keystone in the market strategies of leading distribution system operators (DSOs) and energy retail companies and are increasing in application, soon expected to become a mainstream practice. Informative services based on electricity meter data collected for billing are powerful tools for energy savings in scale and increase customer engagement with the energy suppliers enabling the deployment of demand response programmes helping to optimise distribution grid operation. These services are completely in line with Europe’s 2020 strategy for overall energy performance improvement (cf. directives 2006/32/EC, 2009/72/EC, 2012/27/EU). The Intelligent Energy Europe project EMPOWERING involves 4 European utilities and an international team of university researchers, social scientists and energy experts for developing and providing insight based services and tools for 344.000 residential customers in Austria, France, Italy and Spain. The project adopts a systematic iterative approach of service development based on envisaging the utilities’, customers’ and legal requirements, and incorporates the feedback from testing in the design process. The technological solution provided by the leading partner CIMNE is scalable open source Big Data Analytics System coupled with the DSO’s information systems and delivering a range of value adding services for the customer, such as: - comparison with similar households - indications of performance improvements over time - consumption-weather dependence - detailed consumption visualisation and breakdown - personalised energy saving tips - alerts (high consumption, high bill, extreme temperature, etc.) The paper presents the development approach, describes the ICT system architecture and analyses the legal and regulatory context for providing this kind of services in the European Community. The limitations for third party data access, customer consent and data privacy are discussed, and how these have been overcome with the implementation of the “privacy by design” principle is explained

Ghent University Academic Bibliography

Recommended from our members

Method for Enabling Causal Inference in Relational Domains

Author: Arbour David
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/07/2017
Field of study

The analysis of data from complex systems is quickly becoming a fundamental aspect of modern business, government, and science. The field of causal learning is concerned with developing a set of statistical methods that allow practitioners make inferences about unseen interventions. This field has seen significant advances in recent years. However, the vast majority of this work assumes that data instances are independent, whereas many systems are best described in terms of interconnected instances, i.e. relational systems. This discrepancy prevents causal inference techniques from being reliably applied in many real-world settings. In this thesis, I will present three contributions to the field of causal inference that seek to enable the analysis of relational systems. First, I will present theory for consistently testing statistical dependence in relational domains. I then show how the significance of this test can be measured in practice using a novel bootstrap method for structured domains. Second, I show that statistical dependence in relational domains is inherently asymmetric, implying a simple test of causal direction from observational data. This test requires no assumptions on either the marginal distributions of variables or the functional form of dependence. Third, I describe relational causal adjustment, a procedure to identify the effects of arbitrary interventions from observational relational data via an extension of Pearl\u27s backdoor criterion. A series of evaluations on synthetic domains shows the estimates obtained by relational causal adjustment are close to those obtained from explicit experimentation

ScholarWorks@UMass Amherst

Hypothesis Testing and Model Estimation with Dependent Observations in Heterogeneous Sensor Networks

Author: SOBHIYEH SIMA
Publication venue: LSU Digital Commons
Publication date: 16/05/2018
Field of study

Advances in microelectronics, communication and signal processing have enabled the development of inexpensive sensors that can be networked to collect vital information from their environment to be used in decision-making and inference. The sensors transmit their data to a central processor which integrates the information from the sensors using a so-called fusion algorithm. Many applications of sensor networks (SNs) involve hypothesis testing or the detection of a phenomenon. Many approaches to data fusion for hypothesis testing assume that, given each hypothesis, the sensors\u27 measurements are conditionally independent. However, since the sensors are densely deployed in practice, their field of views overlap and consequently their measurements are dependent. Moreover, a sensor\u27s measurement samples may be correlated over time. Another assumption often used in data fusion algorithms is that the underlying statistical model of sensors\u27 observations is completely known. However, in practice these statistics may not be available prior to deployment and may change over the lifetime of the network due to hardware changes, aging, and environmental conditions. In this dissertation, we consider the problem of data fusion in heterogeneous SNs (SNs in which the sensors are not identical) collecting dependent data. We develop the expectation maximization algorithm for hypothesis testing and model estimation. Copula distributions are used to model the correlation in the data. Moreover, it is assumed that the distribution of the sensors\u27 measurements is not completely known. we consider both parametric and non-parametric model estimation. The proposed approach is developed for both batch and online processing. In batch processing, fusion can only be performed after a block of data samples is received from each sensor, while in online processing, fusion is performed upon arrival of each data sample. Online processing is of great interest since for many applications, the long delay required for the accumulation of data in batch processing is not acceptable. To evaluate the proposed algorithms, both simulation data and real-world datasets are used. Detection performances of the proposed algorithms are compared with well-known supervised and unsupervised learning methods as well as with similar EM-based methods, which either partially or entirely ignore the dependence in the data

Louisiana State University