15,154 research outputs found

    Generalized Method-of-Moments for Rank Aggregation

    Get PDF
    In this paper we propose a class of efficient Generalized Method-of-Moments(GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency.Engineering and Applied SciencesStatistic

    Pareto versus lognormal: a maximum entropy test

    Get PDF
    It is commonly found that distributions that seem to be lognormal over a broad range change to a power-law (Pareto) distribution for the last few percentiles. The distributions of many physical, natural, and social events (earthquake size, species abundance, income and wealth, as well as file, city, and firm sizes) display this structure. We present a test for the occurrence of power-law tails in statistical distributions based on maximum entropy. This methodology allows one to identify the true data-generating processes even in the case when it is neither lognormal nor Pareto. The maximum entropy approach is then compared with other widely used methods and applied to different levels of aggregation of complex systems. Our results provide support for the theory that distributions with lognormal body and Pareto tail can be generated as mixtures of lognormally distributed units

    Rigorous statistical detection and characterization of a deviation from the Gutenberg-Richter distribution above magnitude 8 in subduction zones

    Full text link
    We present a quantitative statistical test for the presence of a crossover c0 in the Gutenberg-Richter distribution of earthquake seismic moments, separating the usual power law regime for seismic moments less than c0 from another faster decaying regime beyond c0. Our method is based on the transformation of the ordered sample of seismic moments into a series with uniform distribution under condition of no crossover. The bootstrap method allows us to estimate the statistical significance of the null hypothesis H0 of an absence of crossover (c0=infinity). When H0 is rejected, we estimate the crossover c0 using two different competing models for the second regime beyond c0 and the bootstrap method. For the catalog obtained by aggregating 14 subduction zones of the Circum Pacific Seismic Belt, our estimate of the crossover point is log(c0) =28.14 +- 0.40 (c0 in dyne-cm), corresponding to a crossover magnitude mW=8.1 +- 0.3. For separate subduction zones, the corresponding estimates are much more uncertain, so that the null hypothesis of an identical crossover for all subduction zones cannot be rejected. Such a large value of the crossover magnitude makes it difficult to associate it directly with a seismogenic thickness as proposed by many different authors in the past. Our measure of c0 may substantiate the concept that the localization of strong shear deformation could propagate significantly in the lower crust and upper mantle, thus increasing the effective size beyond which one should expect a change of regime.Comment: pdf document of 40 pages including 5 tables and 19 figure

    Minimax-optimal Inference from Partial Rankings

    Full text link
    This paper studies the problem of inferring a global preference based on the partial rankings provided by many users over different subsets of items according to the Plackett-Luce model. A question of particular interest is how to optimally assign items to users for ranking and how many item assignments are needed to achieve a target estimation error. For a given assignment of items to users, we first derive an oracle lower bound of the estimation error that holds even for the more general Thurstone models. Then we show that the Cram\'er-Rao lower bound and our upper bounds inversely depend on the spectral gap of the Laplacian of an appropriately defined comparison graph. When the system is allowed to choose the item assignment, we propose a random assignment scheme. Our oracle lower bound and upper bounds imply that it is minimax-optimal up to a logarithmic factor among all assignment schemes and the lower bound can be achieved by the maximum likelihood estimator as well as popular rank-breaking schemes that decompose partial rankings into pairwise comparisons. The numerical experiments corroborate our theoretical findings.Comment: 16 pages, 2 figure
    • …
    corecore