541 research outputs found
Dynamic Tensor Clustering
Dynamic tensor data are becoming prevalent in numerous applications. Existing
tensor clustering methods either fail to account for the dynamic nature of the
data, or are inapplicable to a general-order tensor. Also there is often a gap
between statistical guarantee and computational efficiency for existing tensor
clustering solutions. In this article, we aim to bridge this gap by proposing a
new dynamic tensor clustering method, which takes into account both sparsity
and fusion structures, and enjoys strong statistical guarantees as well as high
computational efficiency. Our proposal is based upon a new structured tensor
factorization that encourages both sparsity and smoothness in parameters along
the specified tensor modes. Computationally, we develop a highly efficient
optimization algorithm that benefits from substantial dimension reduction. In
theory, we first establish a non-asymptotic error bound for the estimator from
the structured tensor factorization. Built upon this error bound, we then
derive the rate of convergence of the estimated cluster centers, and show that
the estimated clusters recover the true cluster structures with a high
probability. Moreover, our proposed method can be naturally extended to
co-clustering of multiple modes of the tensor data. The efficacy of our
approach is illustrated via simulations and a brain dynamic functional
connectivity analysis from an Autism spectrum disorder study.Comment: Accepted at Journal of the American Statistical Associatio
Provable Sparse Tensor Decomposition
We propose a novel sparse tensor decomposition method, namely Tensor
Truncated Power (TTP) method, that incorporates variable selection into the
estimation of decomposition components. The sparsity is achieved via an
efficient truncation step embedded in the tensor power iteration. Our method
applies to a broad family of high dimensional latent variable models, including
high dimensional Gaussian mixture and mixtures of sparse regressions. A
thorough theoretical investigation is further conducted. In particular, we show
that the final decomposition estimator is guaranteed to achieve a local
statistical rate, and further strengthen it to the global statistical rate by
introducing a proper initialization procedure. In high dimensional regimes, the
obtained statistical rate significantly improves those shown in the existing
non-sparse decomposition methods. The empirical advantages of TTP are confirmed
in extensive simulated results and two real applications of click-through rate
prediction and high-dimensional gene clustering.Comment: To Appear in JRSS-
Utility Theory of Synthetic Data Generation
Synthetic data algorithms are widely employed in industries to generate
artificial data for downstream learning tasks. While existing research
primarily focuses on empirically evaluating utility of synthetic data, its
theoretical understanding is largely lacking. This paper bridges the
practice-theory gap by establishing relevant utility theory in a statistical
learning framework. It considers two utility metrics: generalization and
ranking of models trained on synthetic data. The former is defined as the
generalization difference between models trained on synthetic and on real data.
By deriving analytical bounds for this utility metric, we demonstrate that the
synthetic feature distribution does not need to be similar as that of real data
for ensuring comparable generalization of synthetic models, provided proper
model specifications in downstream learning tasks. The latter utility metric
studies the relative performance of models trained on synthetic data. In
particular, we discover that the distribution of synthetic data is not
necessarily similar as the real one to ensure consistent model comparison.
Interestingly, consistent model comparison is still achievable even when
synthetic responses are not well generated, as long as downstream models are
separable by a generalization gap. Finally, extensive experiments on
non-parametric models and deep neural networks have been conducted to validate
these theoretical findings
Contextual Dynamic Pricing with Strategic Buyers
Personalized pricing, which involves tailoring prices based on individual
characteristics, is commonly used by firms to implement a consumer-specific
pricing policy. In this process, buyers can also strategically manipulate their
feature data to obtain a lower price, incurring certain manipulation costs.
Such strategic behavior can hinder firms from maximizing their profits. In this
paper, we study the contextual dynamic pricing problem with strategic buyers.
The seller does not observe the buyer's true feature, but a manipulated feature
according to buyers' strategic behavior. In addition, the seller does not
observe the buyers' valuation of the product, but only a binary response
indicating whether a sale happens or not. Recognizing these challenges, we
propose a strategic dynamic pricing policy that incorporates the buyers'
strategic behavior into the online learning to maximize the seller's cumulative
revenue. We first prove that existing non-strategic pricing policies that
neglect the buyers' strategic behavior result in a linear regret
with the total time horizon, indicating that these policies are not better
than a random pricing policy. We then establish that our proposed policy
achieves a sublinear regret upper bound of . Importantly, our
policy is not a mere amalgamation of existing dynamic pricing policies and
strategic behavior handling algorithms. Our policy can also accommodate the
scenario when the marginal cost of manipulation is unknown in advance. To
account for it, we simultaneously estimate the valuation parameter and the cost
parameter in the online pricing policy, which is shown to also achieve an
regret bound. Extensive experiments support our theoretical
developments and demonstrate the superior performance of our policy compared to
other pricing policies that are unaware of the strategic behaviors
- …