2 research outputs found

    On Power-Law Distributed Balls in Bins and its Applications to View Size Estimation

    Get PDF
    International audienceThe view size estimation plays an important role in query optimization. It has been observed that many data follow a power law distribution. In this paper, we consider the balls in bins problem where we place balls into NN bins when the bin selection probabilities follow a power law distribution. As a generalization to the coupon collector's problem, we address the problem of determining the expected number of balls that need to be thrown in order to have at least one ball in each of the NN bins. We prove that Θ(NαlnNcNα)\Theta(\frac{N^\alpha \ln N}{c_N^{\alpha}}) balls are needed to achieve this where α\alpha is the parameter of the power law distribution and cNα=α1αNα1c_N^{\alpha}=\frac{\alpha-1}{\alpha-N^{\alpha-1}} for α1\alpha \neq 1 and cNα=1lnNc_N^{\alpha}=\frac{1}{\ln N} for α=1\alpha=1. Next, when fixing the number of balls that are thrown to TT, we provide closed form upper and lower bounds on the expected number of bins that have at least one occupant. For nn large and α>1\alpha>1, we prove that our bounds are tight up to a constant factor of (αα1)11αe1/e1.4\left(\frac{\alpha}{\alpha-1}\right)^{1-\frac{1}{\alpha}} \leq e^{1/e} \simeq 1.4

    On Power-Law Distributed Balls in Bins and its Applications to View Size Estimation

    No full text
    International audienceThe view size estimation plays an important role in query optimization. It has been observed that many data follow a power law distribution. In this paper, we consider the balls in bins problem where we place balls into NN bins when the bin selection probabilities follow a power law distribution. As a generalization to the coupon collector's problem, we address the problem of determining the expected number of balls that need to be thrown in order to have at least one ball in each of the NN bins. We prove that Θ(NαlnNcNα)\Theta(\frac{N^\alpha \ln N}{c_N^{\alpha}}) balls are needed to achieve this where α\alpha is the parameter of the power law distribution and cNα=α1αNα1c_N^{\alpha}=\frac{\alpha-1}{\alpha-N^{\alpha-1}} for α1\alpha \neq 1 and cNα=1lnNc_N^{\alpha}=\frac{1}{\ln N} for α=1\alpha=1. Next, when fixing the number of balls that are thrown to TT, we provide closed form upper and lower bounds on the expected number of bins that have at least one occupant. For nn large and α>1\alpha>1, we prove that our bounds are tight up to a constant factor of (αα1)11αe1/e1.4\left(\frac{\alpha}{\alpha-1}\right)^{1-\frac{1}{\alpha}} \leq e^{1/e} \simeq 1.4
    corecore