2 research outputs found
Efficiently Answering Durability Prediction Queries
We consider a class of queries called durability prediction queries that
arise commonly in predictive analytics, where we use a given predictive model
to answer questions about possible futures to inform our decisions. Examples of
durability prediction queries include "what is the probability that this
financial product will keep losing money over the next 12 quarters before
turning in any profit?" and "what is the chance for our proposed server cluster
to fail the required service-level agreement before its term ends?" We devise a
general method called Multi-Level Splitting Sampling (MLSS) that can
efficiently handle complex queries and complex models -- including those
involving black-box functions -- as long as the models allow us to simulate
possible futures step by step. Our method addresses the inefficiency of
standard Monte Carlo (MC) methods by applying the idea of importance splitting
to let one "promising" sample path prefix generate multiple "offspring" paths,
thereby directing simulation efforts toward more promising paths. We propose
practical techniques for designing splitting strategies, freeing users from
manual tuning. Experiments show that our approach is able to achieve unbiased
estimates and the same error guarantees as standard MC while offering an
order-of-magnitude cost reduction.Comment: in SIGMOD 202
Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions
A way of finding interesting or exceptional records from instant-stamped
temporal data is to consider their "durability," or, intuitively speaking, how
well they compare with other records that arrived earlier or later, and how
long they retain their supremacy. For example, people are naturally fascinated
by claims with long durability, such as: "On January 22, 2006, Kobe Bryant
dropped 81 points against Toronto Raptors. Since then, this scoring record has
yet to be broken." In general, given a sequence of instant-stamped records,
suppose that we can rank them by a user-specified scoring function , which
may consider multiple attributes of a record to compute a single score for
ranking. This paper studies "durable top- queries", which find records whose
scores were within top- among those records within a "durability window" of
given length, e.g., a 10-year window starting/ending at the timestamp of the
record. The parameter , the length of the durability window, and parameters
of the scoring function (which capture user preference) can all be given at the
query time. We illustrate why this problem formulation yields more meaningful
answers in some practical situations than other similar types of queries
considered previously. We propose new algorithms for solving this problem, and
provide a comprehensive theoretical analysis on the complexities of the problem
itself and of our algorithms. Our algorithms vastly outperform various
baselines (by up to two orders of magnitude on real and synthetic datasets).Comment: in ICDE 202