Subset Sampling and Its Extensions

Huang, Jinchao; Wang, Sibo

Subset Sampling and Its Extensions

Authors: Jinchao Huang
Sibo Wang
Publication date: 21 July 2023
Publisher

Abstract

This paper studies the \emph{subset sampling} problem. The input is a set

\mathcal{S}

of

n

records together with a function

\textbf{p}

that assigns each record

v\in\mathcal{S}

a probability

\textbf{p}(v)

. A query returns a random subset

X

of

\mathcal{S}

, where each record

v\in\mathcal{S}

is sampled into

X

independently with probability

\textbf{p}(v)

. The goal is to store

\mathcal{S}

in a data structure to answer queries efficiently. If

\mathcal{S}

fits in memory, the problem is interesting when

\mathcal{S}

is dynamic. We develop a dynamic data structure with

\mathcal{O}(1+\mu_{\mathcal{S}})

expected \emph{query} time,

\mathcal{O}(n)

space and

\mathcal{O}(1)

amortized expected \emph{update}, \emph{insert} and \emph{delete} time, where

\mu_{\mathcal{S}}=\sum_{v\in\mathcal{S}}\textbf{p}(v)

. The query time and space are optimal. If

\mathcal{S}

does not fit in memory, the problem is difficult even if

\mathcal{S}

is static. Under this scenario, we present an I/O-efficient algorithm that answers a \emph{query} in

\mathcal{O}\left((\log^*_B n)/B+(\mu_\mathcal{S}/B)\log_{M/B} (n/B)\right)

amortized expected I/Os using

\mathcal{O}(n/B)

space, where

M

is the memory size,

B

is the block size and

\log^*_B n

is the number of iterative

\log_2(.)

operations we need to perform on

n

before going below

B

. In addition, when each record is associated with a real-valued key, we extend the \emph{subset sampling} problem to the \emph{range subset sampling} problem, in which we require that the keys of the sampled records fall within a specified input range

[a,b]

. For this extension, we provide a solution under the dynamic setting, with

\mathcal{O}(\log n+\mu_{\mathcal{S}\cap[a,b]})

expected \emph{query} time,

\mathcal{O}(n)

space and

\mathcal{O}(\log n)

amortized expected \emph{update}, \emph{insert} and \emph{delete} time.Comment: 17 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.11585

Last time updated on 28/07/2023