research

Reading policies for joins: An asymptotic analysis

Abstract

Suppose that mnm_n observations are made from the distribution R\mathbf {R} and nmnn-m_n from the distribution S\mathbf {S}. Associate with each pair, xx from R\mathbf {R} and yy from S\mathbf {S}, a nonnegative score ϕ(x,y)\phi(x,y). An optimal reading policy is one that yields a sequence mnm_n that maximizes E(M(n))\mathbb{E}(M(n)), the expected sum of the (nmn)mn(n-m_n)m_n observed scores, uniformly in nn. The alternating policy, which switches between the two sources, is the optimal nonadaptive policy. In contrast, the greedy policy, which chooses its source to maximize the expected gain on the next step, is shown to be the optimal policy. Asymptotics are provided for the case where the R\mathbf {R} and S\mathbf {S} distributions are discrete and ϕ(x,y)=1or0\phi(x,y)=1 or 0 according as x=yx=y or not (i.e., the observations match). Specifically, an invariance result is proved which guarantees that for a wide class of policies, including the alternating and the greedy, the variable M(n) obeys the same CLT and LIL. A more delicate analysis of the sequence E(M(n))\mathbb{E}(M(n)) and the sample paths of M(n), for both alternating and greedy, reveals the slender sense in which the latter policy is asymptotically superior to the former, as well as a sense of equivalence of the two and robustness of the former.Comment: Published at http://dx.doi.org/10.1214/105051606000000646 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 01/04/2019