85,862 research outputs found
An overview of the goodness-of-fit test problem for copulas
We review the main "omnibus procedures" for goodness-of-fit testing for
copulas: tests based on the empirical copula process, on probability integral
transformations, on Kendall's dependence function, etc, and some corresponding
reductions of dimension techniques. The problems of finding asymptotic
distribution-free test statistics and the calculation of reliable p-values are
discussed. Some particular cases, like convenient tests for time-dependent
copulas, for Archimedean or extreme-value copulas, etc, are dealt with.
Finally, the practical performances of the proposed approaches are briefly
summarized
Building and using semiparametric tolerance regions for parametric multinomial models
We introduce a semiparametric ``tubular neighborhood'' of a parametric model
in the multinomial setting. It consists of all multinomial distributions lying
in a distance-based neighborhood of the parametric model of interest. Fitting
such a tubular model allows one to use a parametric model while treating it as
an approximation to the true distribution. In this paper, the Kullback--Leibler
distance is used to build the tubular region. Based on this idea one can define
the distance between the true multinomial distribution and the parametric model
to be the index of fit. The paper develops a likelihood ratio test procedure
for testing the magnitude of the index. A semiparametric bootstrap method is
implemented to better approximate the distribution of the LRT statistic. The
approximation permits more accurate construction of a lower confidence limit
for the model fitting index.Comment: Published in at http://dx.doi.org/10.1214/08-AOS603 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Tailor-made tests for goodness of fit to semiparametric hypotheses
We introduce a new framework for constructing tests of general semiparametric
hypotheses which have nontrivial power on the scale in every
direction, and can be tailored to put substantial power on alternatives of
importance. The approach is based on combining test statistics based on
stochastic processes of score statistics with bootstrap critical values.Comment: Published at http://dx.doi.org/10.1214/009053606000000137 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A simple and general test for white noise
This article considers testing that a time series is uncorrelated when it possibly exhibits some form of dependence. Contrary to the currently employed tests that require selecting arbitrary user-chosen numbers to compute the associated tests statistics, we consider a test statistic that is very simple to use because it does not require any user chosen number and because its asymptotic null distribution is standard under general weak dependent conditions, and hence, asymptotic critical values are readily available. We consider the case of testing that the raw data is white noise, and also consider the case of applying the test to the residuals of an ARMA model. Finally, we also study finite sample performance
Analyzing Network Traffic for Malicious Hacker Activity
Since the Internet came into life in the 1970s, it has been growing more than 100% every year. On the other hand, the solutions to detecting network intrusion are far outpaced. The economic impact of malicious attacks in lost revenue to a single e-commerce company can vary from 66 thousand up to 53 million US dollars. At the same time, there is no effective mathematical model widely available to distinguish anomaly network behaviours such as port scanning, system exploring, virus and worm propagation from normal traffic.
PDS proposed by Random Knowledge Inc., detects and localizes traffic patterns consistent with attacks hidden within large amounts of legitimate traffic. With the networkâs packet traffic stream being its input, PDS relies on high fidelity models for normal traffic from which it can critically judge the legitimacy of any substream of packet traffic. Because of the reliability on an accurate baseline model for normal network traffic, in this workshop, we concentrate on modelling normal network traffic with a Poisson process
Model Assessment Tools for a Model False World
A standard goal of model evaluation and selection is to find a model that
approximates the truth well while at the same time is as parsimonious as
possible. In this paper we emphasize the point of view that the models under
consideration are almost always false, if viewed realistically, and so we
should analyze model adequacy from that point of view. We investigate this
issue in large samples by looking at a model credibility index, which is
designed to serve as a one-number summary measure of model adequacy. We define
the index to be the maximum sample size at which samples from the model and
those from the true data generating mechanism are nearly indistinguishable. We
use standard notions from hypothesis testing to make this definition precise.
We use data subsampling to estimate the index. We show that the definition
leads us to some new ways of viewing models as flawed but useful. The concept
is an extension of the work of Davies [Statist. Neerlandica 49 (1995)
185--245].Comment: Published in at http://dx.doi.org/10.1214/09-STS302 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Linear-Time Kernel Goodness-of-Fit Test
We propose a novel adaptive test of goodness-of-fit, with computational cost
linear in the number of samples. We learn the test features that best indicate
the differences between observed samples and a reference model, by minimizing
the false negative rate. These features are constructed via Stein's method,
meaning that it is not necessary to compute the normalising constant of the
model. We analyse the asymptotic Bahadur efficiency of the new test, and prove
that under a mean-shift alternative, our test always has greater relative
efficiency than a previous linear-time kernel test, regardless of the choice of
parameters for that test. In experiments, the performance of our method exceeds
that of the earlier linear-time test, and matches or exceeds the power of a
quadratic-time kernel test. In high dimensions and where model structure may be
exploited, our goodness of fit test performs far better than a quadratic-time
two-sample test based on the Maximum Mean Discrepancy, with samples drawn from
the model.Comment: Accepted to NIPS 201
Markov models for fMRI correlation structure: is brain functional connectivity small world, or decomposable into networks?
Correlations in the signal observed via functional Magnetic Resonance Imaging
(fMRI), are expected to reveal the interactions in the underlying neural
populations through hemodynamic response. In particular, they highlight
distributed set of mutually correlated regions that correspond to brain
networks related to different cognitive functions. Yet graph-theoretical
studies of neural connections give a different picture: that of a highly
integrated system with small-world properties: local clustering but with short
pathways across the complete structure. We examine the conditional independence
properties of the fMRI signal, i.e. its Markov structure, to find realistic
assumptions on the connectivity structure that are required to explain the
observed functional connectivity. In particular we seek a decomposition of the
Markov structure into segregated functional networks using decomposable graphs:
a set of strongly-connected and partially overlapping cliques. We introduce a
new method to efficiently extract such cliques on a large, strongly-connected
graph. We compare methods learning different graph structures from functional
connectivity by testing the goodness of fit of the model they learn on new
data. We find that summarizing the structure as strongly-connected networks can
give a good description only for very large and overlapping networks. These
results highlight that Markov models are good tools to identify the structure
of brain connectivity from fMRI signals, but for this purpose they must reflect
the small-world properties of the underlying neural systems
- âŠ