967 research outputs found

    Asymptotically distribution-free goodness-of-fit testing for tail copulas

    Get PDF
    Let (X1,Y1),,(Xn,Yn)(X_1,Y_1),\ldots,(X_n,Y_n) be an i.i.d. sample from a bivariate distribution function that lies in the max-domain of attraction of an extreme value distribution. The asymptotic joint distribution of the standardized component-wise maxima i=1nXi\bigvee_{i=1}^nX_i and i=1nYi\bigvee_{i=1}^nY_i is then characterized by the marginal extreme value indices and the tail copula RR. We propose a procedure for constructing asymptotically distribution-free goodness-of-fit tests for the tail copula RR. The procedure is based on a transformation of a suitable empirical process derived from a semi-parametric estimator of RR. The transformed empirical process converges weakly to a standard Wiener process, paving the way for a multitude of asymptotically distribution-free goodness-of-fit tests. We also extend our results to the mm-variate (m>2m>2) case. In a simulation study we show that the limit theorems provide good approximations for finite samples and that tests based on the transformed empirical process have high power.Comment: Published at http://dx.doi.org/10.1214/14-AOS1304 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Flexible modelling in statistics: past, present and future

    Get PDF
    In times where more and more data become available and where the data exhibit rather complex structures (significant departure from symmetry, heavy or light tails), flexible modelling has become an essential task for statisticians as well as researchers and practitioners from domains such as economics, finance or environmental sciences. This is reflected by the wealth of existing proposals for flexible distributions; well-known examples are Azzalini's skew-normal, Tukey's gg-and-hh, mixture and two-piece distributions, to cite but these. My aim in the present paper is to provide an introduction to this research field, intended to be useful both for novices and professionals of the domain. After a description of the research stream itself, I will narrate the gripping history of flexible modelling, starring emblematic heroes from the past such as Edgeworth and Pearson, then depict three of the most used flexible families of distributions, and finally provide an outlook on future flexible modelling research by posing challenging open questions.Comment: 27 pages, 4 figure

    Change Point Detection and Estimation in Sequences of Dependent Random Variables

    Get PDF
    Two change point detection and estimation procedures for sequences of dependent binary random variables are proposed and their asymptotic properties are explored. The two procedures are a dependent cumulative sum statistic (DCUSUM) and a dependent likelihood ratio test (LRT) statistic, which are generalizations of the independent CUSUM and LRT statistics. A one step Markov dependence is assumed between consecutive variables in the sequence, and the performance of the DCUSUM and dependent LRT are shown to have substantially better size and power performance than their independent counterparts. In most cases, a comparison of the dependent procedures via simulation shows that the dependent LRT provides a more powerful test, while the DCUSUM test has better size performance. The asymptotic distribution of the DCUSUM test is found to be a weighted sum of squared Brownian bridge processes and an approximation to calculate p-values is discussed. A Worsley type upper bound for p-values is provided as an alternative. The asymptotic distribution of the dependent LRT is unknown, but the tail probabilities are found to be empirically bounded by chi-square random variables with 6 and 7 degrees of freedom through a simulation study. A bootstrap algorithm to estimate p-values for the dependent LRT is discussed. Extensions of these procedures to multiple sequences and multinomial random variables are discussed, and a new statistic, the maximal change count statistic, is proposed. An application of the multiple sequence procedures to clustered time series models is provided. The asymptotic properties of the generalized procedures are reserved for future research

    Social Network Analysis with sna

    Get PDF
    Modern social network analysis---the analysis of relational data arising from social systems---is a computationally intensive area of research. Here, we provide an overview of a software package which provides support for a range of network analytic functionality within the R statistical computing environment. General categories of currently supported functionality are described, and brief examples of package syntax and usage are shown.

    The realization problem for tail correlation functions

    Get PDF
    For a stochastic process {Xt}tT\{X_t\}_{t \in T} with identical one-dimensional margins and upper endpoint τup\tau_{\text{up}} its tail correlation function (TCF) is defined through χ(X)(s,t)=limττupP(Xs>τXt>τ)\chi^{(X)}(s,t) = \lim_{\tau \to \tau_{\text{up}}} P(X_s > \tau \,\mid\, X_t > \tau ). It is a popular bivariate summary measure that has been frequently used in the literature in order to assess tail dependence. In this article, we study its realization problem. We show that the set of all TCFs on T×TT \times T coincides with the set of TCFs stemming from a subclass of max-stable processes and can be completely characterized by a system of affine inequalities. Basic closure properties of the set of TCFs and regularity implications of the continuity of χ\chi are derived. If TT is finite, the set of TCFs on T×TT \times T forms a convex polytope of T×T\lvert T \rvert \times \lvert T \rvert matrices. Several general results reveal its complex geometric structure. Up to T=6\lvert T \rvert = 6 a reduced system of necessary and sufficient conditions for being a TCF is determined. None of these conditions will become obsolete as T3\lvert T \rvert\geq 3 grows.Comment: 42 pages, 7 Table

    Efficient Statistics, in High Dimensions, from Truncated Samples

    Full text link
    We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a dd-variate normal N(μ,Σ){\cal N}(\mathbf{\mu},\mathbf{\Sigma}) means a samples is only revealed if it falls in some subset SRdS \subseteq \mathbb{R}^d; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ\mathbf{\mu} and covariance matrix Σ\mathbf{\Sigma} can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to SS, and SS has non-trivial measure under the unknown dd-variate normal distribution. Additionally we show that without oracle access to SS, any non-trivial estimation is impossible.Comment: to appear at 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 201
    corecore