2,926 research outputs found
Active sequential hypothesis testing
Consider a decision maker who is responsible to dynamically collect
observations so as to enhance his information about an underlying phenomena of
interest in a speedy manner while accounting for the penalty of wrong
declaration. Due to the sequential nature of the problem, the decision maker
relies on his current information state to adaptively select the most
``informative'' sensing action among the available ones. In this paper, using
results in dynamic programming, lower bounds for the optimal total cost are
established. The lower bounds characterize the fundamental limits on the
maximum achievable information acquisition rate and the optimal reliability.
Moreover, upper bounds are obtained via an analysis of two heuristic policies
for dynamic selection of actions. It is shown that the first proposed heuristic
achieves asymptotic optimality, where the notion of asymptotic optimality, due
to Chernoff, implies that the relative difference between the total cost
achieved by the proposed policy and the optimal total cost approaches zero as
the penalty of wrong declaration (hence the number of collected samples)
increases. The second heuristic is shown to achieve asymptotic optimality only
in a limited setting such as the problem of a noisy dynamic search. However, by
considering the dependency on the number of hypotheses, under a technical
condition, this second heuristic is shown to achieve a nonzero information
acquisition rate, establishing a lower bound for the maximum achievable rate
and error exponent. In the case of a noisy dynamic search with size-independent
noise, the obtained nonzero rate and error exponent are shown to be maximum.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1144 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain
Real-world data typically contain repeated and periodic patterns. This
suggests that they can be effectively represented and compressed using only a
few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.).
However, distance estimation when the data are represented using different sets
of coefficients is still a largely unexplored area. This work studies the
optimization problems related to obtaining the \emph{tightest} lower/upper
bound on Euclidean distances when each data object is potentially compressed
using a different set of orthonormal coefficients. Our technique leads to
tighter distance estimates, which translates into more accurate search,
learning and mining operations \textit{directly} in the compressed domain.
We formulate the problem of estimating lower/upper distance bounds as an
optimization problem. We establish the properties of optimal solutions, and
leverage the theoretical analysis to develop a fast algorithm to obtain an
\emph{exact} solution to the problem. The suggested solution provides the
tightest estimation of the -norm or the correlation. We show that typical
data-analysis operations, such as k-NN search or k-Means clustering, can
operate more accurately using the proposed compression and distance
reconstruction technique. We compare it with many other prevalent compression
and reconstruction techniques, including random projections and PCA-based
techniques. We highlight a surprising result, namely that when the data are
highly sparse in some basis, our technique may even outperform PCA-based
compression.
The contributions of this work are generic as our methodology is applicable
to any sequential or high-dimensional data as well as to any orthogonal data
transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD
Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability
We study a statistical method to estimate the optimal value, and the
optimality gap of a given solution for stochastic optimization as an assessment
of the solution quality. Our approach is based on bootstrap aggregating, or
bagging, resampled sample average approximation (SAA). We show how this
approach leads to valid statistical confidence bounds for non-smooth
optimization. We also demonstrate its statistical efficiency and stability that
are especially desirable in limited-data situations, and compare these
properties with some existing methods. We present our theory that views SAA as
a kernel in an infinite-order symmetric statistic, which can be approximated
via bagging. We substantiate our theoretical findings with numerical results
- …