12 research outputs found

    Efficient Data Representation by Selecting Prototypes with Importance Weights

    Full text link
    Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.Comment: Accepted for publication in International Conference on Data Mining (ICDM) 201

    Data drift correction via time-varying importance weight estimator

    Full text link
    Real-world deployment of machine learning models is challenging when data evolves over time. And data does evolve over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a novel time-varying importance weight estimator that can detect gradual shifts in the distribution of data. Such an importance weight estimator allows the training method to selectively sample past data -- not just similar data from the past like a standard importance weight estimator would but also data that evolved in a similar fashion in the past. Our time-varying importance weight is quite general. We demonstrate different ways of implementing it that exploit some known structure in the evolution of data. We demonstrate and evaluate this approach on a variety of problems ranging from supervised learning tasks (multiple image classification datasets) where the data undergoes a sequence of gradual shifts of our design to reinforcement learning tasks (robotic manipulation and continuous control) where data undergoes a shift organically as the policy or the task changes

    Stratified Learning: a general-purpose statistical method for improved learning under Covariate Shift

    Get PDF
    Covariate shift arises when the labelled training (source) data is not representative of the unlabelled (target) data due to systematic differences in the covariate distributions. A supervised model trained on the source data subject to covariate shift may suffer from poor generalization on the target data. We propose a novel, statistically principled and theoretically justified method to improve learning under covariate shift conditions, based on propensity score stratification, a well-established methodology in causal inference. We show that the effects of covariate shift can be reduced or altogether eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners on subgroups ("strata") constructed by partitioning the data based on the estimated propensity scores, leading to balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on contemporary research questions in observational cosmology, and on additional benchmark examples, matching or outperforming state-of-the-art importance weighting methods, widely studied in the covariate shift literature. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge" and improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data

    The out-of-sample prediction error of the square-root-LASSO and related estimators

    Full text link
    We study the classical problem of predicting an outcome variable, YY, using a linear combination of a dd-dimensional covariate vector, X\mathbf{X}. We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbol{\beta} \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}\beta \right)^r \right] \right)^{1/r} +\delta \, \rho\left(\boldsymbol{\beta}\right), \end{align*} where δ>0\delta>0 is a regularization parameter, ρ:RdR+\rho:\mathbb{R}^d\to \mathbb{R}_+ is a convex penalty function, Pn\mathbb{P}_n is the empirical distribution of the data, and r1r\geq 1. We present three sets of new results. First, we provide conditions under which linear predictors based on these estimators % solve a \emph{distributionally robust optimization} problem: they minimize the worst-case prediction error over distributions that are close to each other in a type of \emph{max-sliced Wasserstein metric}. Second, we provide a detailed finite-sample and asymptotic analysis of the statistical properties of the balls of distributions over which the worst-case prediction error is analyzed. Third, we use the distributionally robust optimality and our statistical analysis to present i) an oracle recommendation for the choice of regularization parameter, δ\delta, that guarantees good out-of-sample prediction error; and ii) a test-statistic to rank the out-of-sample performance of two different linear estimators. None of our results rely on sparsity assumptions about the true data generating process; thus, they broaden the scope of use of the square-root lasso and related estimators in prediction problems
    corecore