139 research outputs found

    The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning

    Full text link
    Suppose we are given access to nn independent samples from distribution μ\mu and we wish to output one of them with the goal of making the output distributed as close as possible to a target distribution ν\nu. In this work we show that the optimal total variation distance as a function of nn is given by Θ~(Df(n))\tilde\Theta(\frac{D}{f'(n)}) over the class of all pairs ν,μ\nu,\mu with a bounded ff-divergence Df(νμ)DD_f(\nu\|\mu)\leq D. Previously, this question was studied only for the case when the Radon-Nikodym derivative of ν\nu with respect to μ\mu is uniformly bounded. We then consider an application in the seemingly very different field of smoothed online learning, where we show that recent results on the minimax regret and the regret of oracle-efficient algorithms still hold even under relaxed constraints on the adversary (to have bounded ff-divergence, as opposed to bounded Radon-Nikodym derivative). Finally, we also study efficacy of importance sampling for mean estimates uniform over a function class and compare importance sampling with rejection sampling

    Tight bounds for maximum 1\ell_1-margin classifiers

    Full text link
    Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum 1\ell_1-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the 1\ell_1-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum 1\ell_1-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order w12/3n1/3\frac{\|w^*\|_1^{2/3}}{n^{1/3}} for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order 1log(d/n)\frac{1}{\sqrt{\log(d/n)}}. We are therefore first to show benign overfitting for the maximum 1\ell_1-margin classifier

    A Bootstrap Hypothesis Test for High-Dimensional Mean Vectors

    Full text link
    This paper is concerned with testing global null hypotheses about population mean vectors of high-dimensional data. Current tests require either strong mixing (independence) conditions on the individual components of the high-dimensional data or high-order moment conditions. In this paper, we propose a novel class of bootstrap hypothesis tests based on p\ell_p-statistics with p[1,]p \in [1, \infty] which requires neither of these assumptions. We study asymptotic size, unbiasedness, consistency, and Bahadur slope of these tests. Capitalizing on these theoretical insights, we develop a modified bootstrap test with improved power properties and a self-normalized bootstrap test for elliptically distributed data. We then propose two novel bias correction procedures to improve the accuracy of the bootstrap test in finite samples, which leverage measure concentration and hypercontractivity properties of p\ell_p-norms in high dimensions. Numerical experiments support our theoretical results in finite samples.Comment: 86 pages, 4 figure

    Structured Mixture Models

    Get PDF
    Finite mixture models are a staple of model-based clustering approaches for distinguishing subgroups. A common mixture model is the finite Gaussian mixture model, whose degrees of freedom scales quadratically with increasing data dimension. Methods in the literature often tackle the degrees of freedom of the Gaussian mixture model by sharing parameters between the eigendecomposition of covariance matrices across all mixture components. We posit finite Gaussian mixture models with alternate forms of parameter sharing by imposing additional structure on the parameters, such as sharing parameters with other components as a convex combination of the corresponding parent components or by imposing a sequence of hierarchical clustering structure in orthogonal subspaces with common parameters across levels. Estimation procedures using the Expectation-Maximization (EM) algorithm are derived throughout, with application to simulated and real-world datasets. As well, the proposed model structures have an interpretable meaning that can shed light on clustering analyses performed by practitioners in the context of their data. The EM algorithm is a popular estimation method for tackling issues of latent data, such as in finite mixture models where component memberships are often latent. One aspect of the EM algorithm that hampers estimation is a slow rate of convergence, which affects the estimation of finite Gaussian mixture models. To explore avenues of improvement, we explore the extrapolation of the sequence of conditional expectations admitting general EM procedures, with minimal modifications for many common models. With the same mindset of accelerating iterative algorithms, we also examine the use of approximate sketching methods in estimating generalized linear models via iteratively re-weighted least squares, with emphasis on practical data infrastructure constraints. We propose a sketching method that controls for both data transfer and computation costs, the former of which is often overlooked in asymptotic complexity analyses, and are able to achieve an approximate result in much faster wall-clock time compared to the exact solution on real-world hardware, and can estimate standard errors in addition to point estimates

    Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information

    Full text link
    Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct mapping from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consistency result for constrained ones. To solve the nonconvex and nondifferentiable ERM problem, we develop an enhanced stochastic majorization-minimization algorithm and establish the asymptotic convergence to (composite strong) directional stationarity along with complexity analysis. We show that the proposed PADR-based ERM method applies to a broad class of nonconvex SP problems with theoretical consistency guarantees and computational tractability. Our numerical study demonstrates the superior performance of PADR-based ERM methods compared to state-of-the-art approaches under various settings, with significantly lower costs, less computation time, and robustness to feature dimensions and nonlinearity of the underlying dependency

    Functional Finite Mixture Modelling and Estimation

    Get PDF
    Functional data analysis is a branch of statistics that studies models for information represented by functions. Meanwhile, finite mixture models serve as a conerstone in the field of cluster analysis, offering a flexible probabilisitic framework for the representation of heterogeneous data. These models posit that the observed data are drawn from a mixture of several different probability distributions from the same family, where each is conventionally thought to represent a distinct group within the overall population. However, their representation in terms of densities makes their application to function-valued random variables, the foundation of functional data analysis, difficult. Herein, we utilize density surrogates derived from the Karhunen-Loeve expansion to circumvent this discrepancy and develop functional finite mixture models for the clustering of functional data. Models developed for real-valued and vector-valued functions of a single variable. Estimation of all models is done using the expectation-maximization algorithm, and copious amounts of simulations and data examples are provided to demonstrate the properties and performance of the methodologies. Additionally, we present a new estimation approach to be used in tandem with the stochastic expectation-maximization algorithm. This estimation method offers increased precision in estimation with respect to the algorithm chain length when compared to averaging the chain. Asymptotic properties of the estimator are derived, and simulation studies are given to demonstrate its performance

    New methods for fixed-margin binary matrix sampling, Fréchet covariance, and MANOVA tests for random objects in multiple metric spaces

    Get PDF
    2022 Summer.Includes bibliographical references.Many approaches to the analysis of network data essentially view the data as Euclidean and apply standard multivariate techniques. In this dissertation, we refrain from this approach, exploring two alternate approaches to the analysis of networks and other structured data. The first approach seeks to determine how unique an observed simple, directed network is by comparing it to like networks which share its degree distribution. Generating networks for comparison requires sampling from the space of all binary matrices with the prescribed row and column margins, since enumeration of all such matrices is often infeasible for even moderately sized networks with 20-50 nodes. We propose two new sampling methods for this problem. First, we extend two Markov chain Monte Carlo methods to sample from the space non-uniformly, allowing flexibility in the case that some networks are more likely than others. We show that non-uniform sampling could impede the MCMC process, but in certain special cases is still valid. Critically, we illustrate the differential conclusions that could be drawn from uniform vs. nonuniform sampling. Second, we develop a generalized divide and conquer approach which recursively divides matrices into smaller subproblems which are much easier to count and sample. Each division step reveals interesting mathematics involving the enumeration of integer partitions and points in convex lattice polytopes. The second broad approach we explore is comparing random objects in metric spaces lacking a coordinate system. Traditional definitions of the mean and variance no longer apply, and standard statistical tests have needed reconceptualization in terms of only distances in the metric space. We consider the multivariate setting where random objects exist in multiple metric spaces, which can be thought of as distinct views of the random object. We define the notion of Fréchet covariance to measure dependence between two metric spaces, and establish consistency for the sample estimator. We then propose several tests for differences in means and covariance matrices among two or more groups in multiple metric spaces, and compare their performance on scenarios involving random probability distributions and networks with node covariates

    Fully Automated Parameter Estimation for Mixtures of Factor Analyzers

    Get PDF
    Mixture models are a family of statistical models that can model datasets with underlying sub-population structures effectively. This thesis focuses on one particular mixture model, called the Mixtures of Factor Analyzers (MFA) model [Ghahramani et al., 1997], which is a multivariate clustering model more parsimonious than the well known Gaussian mixture model (GMM). The MFA model has two hyperparameters, g, the number of components, and q, the number of factors per component. When these are assumed to be known in advance, approximate maximum likelihood estimates for the remaining model parameters can be obtained using Expectation Maximisation (EM)-type algorithms [Dempster et al., 1977] [Ghahramani et al., 1997] [McLachlan and Peel, 2000] [Zhao and Yu, 2008]. This work reviews methods for fitting the MFA model in the more realistic case where its two hyperparameters are not known a priori. A systematic comparison of seven methods for fitting the MFA model when its hyperparameters are unknown is conducted. The methods are compared based on their ability to infer the two hyperparameters accurately, as well as general model fit, clustering accuracy and the length of time taken to fit the model. The results suggest that a naive grid search over both hyperparameters performs the best on all of the metrics except for the time taken to fit the models. The Infinite Mixtures of Infinite Factor Analyzers (IMIFA) algorithm [Murphy et al., 2020] also performs well on most of the metrics. However, like the naive search, IMIFA is also very computationally intensive. The Automatic Mixture of Factor Analyzers (AMFA) algorithm [Wang and Lin, 2020] is a viable alternative when available computation time is limited, as it often performs comparably to the na¨ıve search and IMIFA, but with greatly reduced computation times. To facilitate the comparison, the R package autoMFA is created, which implements five methods for the automated fitting of the MFA model and is available on the Comprehensive R Archive Network (CRAN). A limitation of the MFA model is its inability to deal with asymmetrical cluster shapes, which is a consequence of using multivariate Gaussian component densities. The Mixtures of Mean-Variance Mixture of Normal Distribution Factor Analyzers (MMVMNFA) family is proposed as a generalisation of the MFA model, which permits asymmetrical component densities. A new EM-type algorithm for parameter estimation of MMVMNFA models is developed. Based on its performance in the comparison, the AMFA algorithm is selected and generalised to the MMVMNFA family. Six specific instances of the MMVMNFA family are considered, and the steps for the EM-type algorithm are derived for each. The Julia package FactorMixtures is created, which contains implementations of each of these algorithms. The six instances are tested on two synthetic datasets and two real world datasets, where their superior ability to capture heavy-tailed data and data exhibiting multivariate skewness is demonstrated in comparison to the standard MFA model, which cannot effectively capture either of these properties.Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 202

    Sparse PCA for Multi-Block Data

    Get PDF
    corecore