41 research outputs found

    The coupon collector urn model with unequal probabilities in ecology and evolution

    Get PDF
    International audienceThe sequential sampling of populations with unequal probabilities and with replacement in a closed population is a recurrent problem in ecology and evolution. Examples range from biodiversity sampling, epidemiology to the estimation of signal repertoire in animal communication. Many of these ques- tions can be reformulated as urn problems, often as special cases of the coupon collector problem, most simply expressed as the number of coupons that must be collected to have a complete set. We aimed to apply the coupon collector model in a comprehensive manner to one example—hosts (balls) being searched (draws) and parasitized (ball colour change) by parasitic wasps— to evaluate the influence of differences in sampling probabilities between items on collection speed. Based on the model of a complete multinomial process over time, we define the distribution, distribution function, expectation and variance of the number of hosts parasitized after a given time, as well as the inverse problem, estimating the sampling effort. We develop the relationship between the risk distribution on the set of hosts and the speed of parasitization and propose a more elegant proof of the weak stochastic dominance among speeds of parasitization, using the concept of Schur convexity and the ‘Robin Hood transfer’ numerical operation. Numerical examples are provided and a conjecture about strong dominance—an ordering characteristic of random variables—is proposed. The speed at which new items are discovered is a function of the entire shape of the sampling probability distribution. The sole comparison of values of variances is not sufficient to compare speeds associated with different distributions, as generally assumed in ecological studies

    Lorenz-based quantitative risk management

    Get PDF
    In this thesis, we address problems of quantitative risk management using a specific set of tools that go under the name of Lorenz curve and inequality indices, developed to describe the socio-economic variability of a random variable.Quantitative risk management deals with the estimation of the uncertainty that isembedded in the activities of banks and other financial players due, for example, tomarket fluctuations. Since the well-being of such financial players is fundamental for the correct functioning of the economic system, an accurate description and estimation of such uncertainty is crucial.Applied ProbabilityNumerical Analysi

    Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

    Full text link
    The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

    Polynomial methods in statistical inference: Theory and practice

    Get PDF
    Recent advances in genetics, computer vision, and text mining are accompanied by analyzing data coming from a large domain, where the domain size is comparable or larger than the number of samples. In this dissertation, we apply the polynomial methods to several statistical questions with rich history and wide applications. The goal is to understand the fundamental limits of the problems in the large domain regime, and to design sample optimal and time efficient algorithms with provable guarantees. The first part investigates the problem of property estimation. Consider the problem of estimating the Shannon entropy of a distribution over kk elements from nn independent samples. We obtain the minimax mean-square error within universal multiplicative constant factors if nn exceeds a constant factor of k/log⁥(k)k/\log(k); otherwise there exists no consistent estimator. This refines the recent result on the minimal sample size for consistent entropy estimation. The apparatus of best polynomial approximation plays a key role in both the construction of optimal estimators and, via a duality argument, the minimax lower bound. We also consider the problem of estimating the support size of a discrete distribution whose minimum non-zero mass is at least 1k \frac{1}{k}. Under the independent sampling model, we show that the sample complexity, i.e., the minimal sample size to achieve an additive error of Ï”k\epsilon k with probability at least 0.1 is within universal constant factors of klog⁥klog⁥21Ï” \frac{k}{\log k}\log^2\frac{1}{\epsilon} , which improves the state-of-the-art result of kÏ”2log⁥k \frac{k}{\epsilon^2 \log k} . Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in O(n+log⁥2k)O(n+\log^2 k) time and attains the sample complexity within constant factors. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets. When the distribution is supported on a discrete set, estimating the support size is also known as the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing k k balls based on nn samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within O(log⁥log⁥k)O(\log\log k) factors, and in fact within constant factors for most cases. The estimator can be computed in O(n)O(n) time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size. One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest. The second part studies the problem of learning Gaussian mixtures. The method of moments is one of the most widely used methods in statistics for parameter estimation, by means of solving the system of equations that match the population and estimated moments. However, in practice and especially for the important case of mixture models, one frequently needs to contend with the difficulties of non-existence or non-uniqueness of statistically meaningful solutions, as well as the high computational cost of solving large polynomial systems. Moreover, theoretical analysis of the method of moments are mainly confined to asymptotic normality style of results established under strong assumptions. We consider estimating a kk-component Gaussian location mixture with a common (possibly unknown) variance parameter. To overcome the aforementioned theoretic and algorithmic hurdles, a crucial step is to denoise the moment estimates by projecting to the truncated moment space (via semidefinite programming) before solving the method of moments equations. Not only does this regularization ensures existence and uniqueness of solutions, it also yields fast solvers by means of Gauss quadrature. Furthermore, by proving new moment comparison theorems in the Wasserstein distance via polynomial interpolation and majorization techniques, we establish the statistical guarantees and adaptive optimality of the proposed procedure, as well as oracle inequality in misspecified models. These results can also be viewed as provable algorithms for generalized method of moments which involves non-convex optimization and lacks theoretical guarantees

    Author index to volumes 301–400

    Get PDF
    corecore