17 research outputs found
Online Active Linear Regression via Thresholding
We consider the problem of online active learning to collect data for
regression modeling. Specifically, we consider a decision maker with a limited
experimentation budget who must efficiently learn an underlying linear
population model. Our main contribution is a novel threshold-based algorithm
for selection of most informative observations; we characterize its performance
and fundamental lower bounds. We extend the algorithm and its guarantees to
sparse linear regression in high-dimensional settings. Simulations suggest the
algorithm is remarkably robust: it provides significant benefits over passive
random sampling in real-world datasets that exhibit high nonlinearity and high
dimensionality --- significantly reducing both the mean and variance of the
squared error.Comment: Published in AAAI 201
-regression with Heavy-tailed Distributions
In this paper, we consider the problem of linear regression with heavy-tailed
distributions. Different from previous studies that use the squared loss to
measure the performance, we choose the absolute loss, which is capable of
estimating the conditional median. To address the challenge that both the input
and output could be heavy-tailed, we propose a truncated minimization problem,
and demonstrate that it enjoys an excess risk,
where is the dimensionality and is the number of samples. Compared with
traditional work on -regression, the main advantage of our result is
that we achieve a high-probability risk bound without exponential moment
conditions on the input and output. Furthermore, if the input is bounded, we
show that the classical empirical risk minimization is competent for
-regression even when the output is heavy-tailed
L2P: An Algorithm for Estimating Heavy-tailed Outcomes
Many real-world prediction tasks have outcome variables that have
characteristic heavy-tail distributions. Examples include copies of books sold,
auction prices of art pieces, demand for commodities in warehouses, etc. By
learning heavy-tailed distributions, "big and rare" instances (e.g., the
best-sellers) will have accurate predictions. Most existing approaches are not
dedicated to learning heavy-tailed distribution; thus, they heavily
under-predict such instances. To tackle this problem, we introduce Learning to
Place (L2P), which exploits the pairwise relationships between instances for
learning. In its training phase, L2P learns a pairwise preference classifier:
is instance A > instance B? In its placing phase, L2P obtains a prediction by
placing the new instance among the known instances. Based on its placement, the
new instance is then assigned a value for its outcome variable. Experiments on
real data show that L2P outperforms competing approaches in terms of accuracy
and ability to reproduce heavy-tailed outcome distribution. In addition, L2P
provides an interpretable model by placing each predicted instance in relation
to its comparable neighbors. Interpretable models are highly desirable when
lives and treasure are at stake.Comment: 9 pages, 6 figures, 2 tables Nature of changes from previous version:
1. Added complexity analysis in Section 2.2 2. Datasets change 3. Added
LambdaMART in the baseline methods, also a brief discussion on why LambdaMart
failed in our problem. 4. Figure update