10,584 research outputs found
Streaming Weighted Sampling over Join Queries
Join queries are a fundamental database tool, capturing a range of tasks that involve linking heterogeneous data sources. However, with massive table sizes, it is often impractical to keep these in memory, and we can only take one or few streaming passes over them. Moreover, building out the full join result (e.g., linking heterogeneous data sources along quasi-identifiers) can lead to a combinatorial explosion of results due to many-to-many links. Random sampling is a natural tool to boil this oversized result down to a representative subset with well-understood statistical properties, but turns out to be a challenging task due to the combinatorial nature of the sampling domain. Existing techniques in the literature focus solely on the setting with tabular data residing in main memory, and do not address aspects such as stream operation, weighted sampling and more general join operators that are urgently needed in a modern data processing context. The main contribution of this work is to meet these needs with more lightweight practical approaches. First, a bijection between the sampling problem and a graph problem is introduced to support weighted sampling and common join operators. Second, the sampling techniques are refined to minimise the number of streaming passes. Third, techniques are presented to deal with very large tables under limited memory. Finally, the proposed techniques are compared to existing approaches that rely on database indices and the results indicate substantial memory savings, reduced runtimes for ad-hoc queries and competitive amortised runtimes
Zeros of random tropical polynomials, random polytopes and stick-breaking
For , let be independent and identically
distributed random variables with distribution with support .
The number of zeros of the random tropical polynomials is also the number of faces of the lower convex
hull of the random points in . We show that this
number, , satisfies a central limit theorem when has polynomial decay
near . Specifically, if near behaves like a
distribution for some , then has the same asymptotics as the
number of renewals on the interval of a renewal process with
inter-arrival distribution . Our proof draws on connections
between random partitions, renewal theory and random polytopes. In particular,
we obtain generalizations and simple proofs of the central limit theorem for
the number of vertices of the convex hull of uniform random points in a
square. Our work leads to many open problems in stochastic tropical geometry,
the study of functionals and intersections of random tropical varieties.Comment: 22 pages, 5 figure
- β¦