15,154 research outputs found
Generalized Method-of-Moments for Rank Aggregation
In this paper we propose a class of efficient Generalized Method-of-Moments(GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency.Engineering and Applied SciencesStatistic
Pareto versus lognormal: a maximum entropy test
It is commonly found that distributions that seem to be lognormal over a broad range change to a power-law (Pareto) distribution for the last few percentiles. The distributions of many physical, natural, and social events (earthquake size, species abundance, income and wealth, as well as file, city, and firm sizes) display this structure. We present a test for the occurrence of power-law tails in statistical distributions based on maximum entropy. This methodology allows one to identify the true data-generating processes even in the case when it is neither lognormal nor Pareto. The maximum entropy approach is then compared with other widely used methods and applied to different levels of aggregation of complex systems. Our results provide support for the theory that distributions with lognormal body and Pareto tail can be generated as mixtures of lognormally distributed units
Rigorous statistical detection and characterization of a deviation from the Gutenberg-Richter distribution above magnitude 8 in subduction zones
We present a quantitative statistical test for the presence of a crossover c0
in the Gutenberg-Richter distribution of earthquake seismic moments, separating
the usual power law regime for seismic moments less than c0 from another faster
decaying regime beyond c0. Our method is based on the transformation of the
ordered sample of seismic moments into a series with uniform distribution under
condition of no crossover. The bootstrap method allows us to estimate the
statistical significance of the null hypothesis H0 of an absence of crossover
(c0=infinity). When H0 is rejected, we estimate the crossover c0 using two
different competing models for the second regime beyond c0 and the bootstrap
method. For the catalog obtained by aggregating 14 subduction zones of the
Circum Pacific Seismic Belt, our estimate of the crossover point is log(c0)
=28.14 +- 0.40 (c0 in dyne-cm), corresponding to a crossover magnitude mW=8.1
+- 0.3. For separate subduction zones, the corresponding estimates are much
more uncertain, so that the null hypothesis of an identical crossover for all
subduction zones cannot be rejected. Such a large value of the crossover
magnitude makes it difficult to associate it directly with a seismogenic
thickness as proposed by many different authors in the past. Our measure of c0
may substantiate the concept that the localization of strong shear deformation
could propagate significantly in the lower crust and upper mantle, thus
increasing the effective size beyond which one should expect a change of
regime.Comment: pdf document of 40 pages including 5 tables and 19 figure
Minimax-optimal Inference from Partial Rankings
This paper studies the problem of inferring a global preference based on the
partial rankings provided by many users over different subsets of items
according to the Plackett-Luce model. A question of particular interest is how
to optimally assign items to users for ranking and how many item assignments
are needed to achieve a target estimation error. For a given assignment of
items to users, we first derive an oracle lower bound of the estimation error
that holds even for the more general Thurstone models. Then we show that the
Cram\'er-Rao lower bound and our upper bounds inversely depend on the spectral
gap of the Laplacian of an appropriately defined comparison graph. When the
system is allowed to choose the item assignment, we propose a random assignment
scheme. Our oracle lower bound and upper bounds imply that it is
minimax-optimal up to a logarithmic factor among all assignment schemes and the
lower bound can be achieved by the maximum likelihood estimator as well as
popular rank-breaking schemes that decompose partial rankings into pairwise
comparisons. The numerical experiments corroborate our theoretical findings.Comment: 16 pages, 2 figure
- …