429 research outputs found

    Oracle posterior contraction rates under hierarchical priors

    Full text link
    We offer a general Bayes theoretic framework to derive posterior contraction rates under a hierarchical prior design: the first-step prior serves to assess the model selection uncertainty, and the second-step prior quantifies the prior belief on the strength of the signals within the model chosen from the first step. In particular, we establish non-asymptotic oracle posterior contraction rates under (i) a local Gaussianity condition on the log likelihood ratio of the statistical experiment, (ii) a local entropy condition on the dimensionality of the models, and (iii) a sufficient mass condition on the second-step prior near the best approximating signal for each model. The first-step prior can be designed generically. The posterior distribution enjoys Gaussian tail behavior and therefore the resulting posterior mean also satisfies an oracle inequality, automatically serving as an adaptive point estimator in a frequentist sense. Model mis-specification is allowed in these oracle rates. The local Gaussianity condition serves as a unified attempt of non-asymptotic Gaussian quantification of the experiments, and can be easily verified in various experiments considered in [GvdV07a] and beyond. The general results are applied in various problems including: (i) trace regression, (ii) shape-restricted isotonic/convex regression, (iii) high-dimensional partially linear regression, (iv) covariance matrix estimation in the sparse factor model, (v) detection of non-smooth polytopal image boundary, and (vi) intensity estimation in a Poisson point process model. These new results serve either as theoretical justification of practical prior proposals in the literature, or as an illustration of the generic construction scheme of a (nearly) minimax adaptive estimator for a complicated experiment

    Set structured global empirical risk minimizers are rate optimal in general dimensions

    Full text link
    Entropy integrals are widely used as a powerful empirical process tool to obtain upper bounds for the rates of convergence of global empirical risk minimizers (ERMs), in standard settings such as density estimation and regression. The upper bound for the convergence rates thus obtained typically matches the minimax lower bound when the entropy integral converges, but admits a strict gap compared to the lower bound when it diverges. Birg\'e and Massart [BM93] provided a striking example showing that such a gap is real with the entropy structure alone: for a variant of the natural H\"older class with low regularity, the global ERM actually converges at the rate predicted by the entropy integral that substantially deviates from the lower bound. The counter-example has spawned a long-standing negative position on the use of global ERMs in the regime where the entropy integral diverges, as they are heuristically believed to converge at a sub-optimal rate in a variety of models. The present paper demonstrates that this gap can be closed if the models admit certain degree of `set structures' in addition to the entropy structure. In other words, the global ERMs in such set structured models will indeed be rate-optimal, matching the lower bound even when the entropy integral diverges. The models with set structures we investigate include (i) image and edge estimation, (ii) binary classification, (iii) multiple isotonic regression, (iv) ss-concave density estimation, all in general dimensions when the entropy integral diverges. Here set structures are interpreted broadly in the sense that the complexity of the underlying models can be essentially captured by the size of the empirical process over certain class of measurable sets, for which matching upper and lower bounds are obtained to facilitate the derivation of sharp convergence rates for the associated global ERMs.Comment: 42 page

    NegCut: Automatic Image Segmentation based on MRF-MAP

    Full text link
    Solving the Maximum a Posteriori on Markov Random Field, MRF-MAP, is a prevailing method in recent interactive image segmentation tools. Although mathematically explicit in its computational targets, and impressive for the segmentation quality, MRF-MAP is hard to accomplish without the interactive information from users. So it is rarely adopted in the automatic style up to today. In this paper, we present an automatic image segmentation algorithm, NegCut, based on the approximation to MRF-MAP. First we prove MRF-MAP is NP-hard when the probabilistic models are unknown, and then present an approximation function in the form of minimum cuts on graphs with negative weights. Finally, the binary segmentation is taken from the largest eigenvector of the target matrix, with a tuned version of the Lanczos eigensolver. It is shown competitive at the segmentation quality in our experiments.Comment: Since it's an unlucky failure about length-limit violation, I'd like to save it on arXiv as a record. Any suggestions are welcom

    A Simple Unsupervised Color Image Segmentation Method based on MRF-MAP

    Full text link
    Color image segmentation is an important topic in the image processing field. MRF-MAP is often adopted in the unsupervised segmentation methods, but their performance are far behind recent interactive segmentation tools supervised by user inputs. Furthermore, the existing related unsupervised methods also suffer from the low efficiency, and high risk of being trapped in the local optima, because MRF-MAP is currently solved by iterative frameworks with inaccurate initial color distribution models. To address these problems, the letter designs an efficient method to calculate the energy functions approximately in the non-iteration style, and proposes a new binary segmentation algorithm based on the slightly tuned Lanczos eigensolver. The experiments demonstrate that the new algorithm achieves competitive performance compared with two state-of-art segmentation methods.Comment: Submitted to IEEE SP

    Multivariate convex regression: global risk bounds and adaptation

    Full text link
    We study the problem of estimating a multivariate convex function defined on a convex body in a regression setting with random design. We are interested in optimal rates of convergence under a squared global continuous l2l_2 loss in the multivariate setting (dβ‰₯2)(d\geq 2). One crucial fact is that the minimax risks depend heavily on the shape of the support of the regression function. It is shown that the global minimax risk is on the order of nβˆ’2/(d+1)n^{-2/(d+1)} when the support is sufficiently smooth, but that the rate nβˆ’4/(d+4)n^{-4/(d+4)} is when the support is a polytope. Such differences in rates are due to difficulties in estimating the regression function near the boundary of smooth regions. We then study the natural bounded least squares estimators (BLSE): we show that the BLSE nearly attains the optimal rates of convergence in low dimensions, while suffering rate-inefficiency in high dimensions. We show that the BLSE adapts nearly parametrically to polyhedral functions when the support is polyhedral in low dimensions by a local entropy method. We also show that the boundedness constraint cannot be dropped when risk is assessed via continuous l2l_2 loss. Given rate sub-optimality of the BLSE in higher dimensions, we further study rate-efficient adaptive estimation procedures. Two general model selection methods are developed to provide sieved adaptive estimators (SAE) that achieve nearly optimal rates of convergence for particular "regular" classes of convex functions, while maintaining nearly parametric rate-adaptivity to polyhedral functions in arbitrary dimensions. Interestingly, the uniform boundedness constraint is unnecessary when risks are measured in discrete l2l_2 norms.Comment: 75 page

    Convergence rates of least squares regression estimators with heavy-tailed errors

    Full text link
    We study the performance of the Least Squares Estimator (LSE) in a general nonparametric regression model, when the errors are independent of the covariates but may only have a pp-th moment (pβ‰₯1p\geq 1). In such a heavy-tailed regression setting, we show that if the model satisfies a standard `entropy condition' with exponent α∈(0,2)\alpha \in (0,2), then the L2L_2 loss of the LSE converges at a rate \begin{align*} \mathcal{O}_{\mathbf{P}}\big(n^{-\frac{1}{2+\alpha}} \vee n^{-\frac{1}{2}+\frac{1}{2p}}\big). \end{align*} Such a rate cannot be improved under the entropy condition alone. This rate quantifies both some positive and negative aspects of the LSE in a heavy-tailed regression setting. On the positive side, as long as the errors have pβ‰₯1+2/Ξ±p\geq 1+2/\alpha moments, the L2L_2 loss of the LSE converges at the same rate as if the errors are Gaussian. On the negative side, if p<1+2/Ξ±p<1+2/\alpha, there are (many) hard models at any entropy level Ξ±\alpha for which the L2L_2 loss of the LSE converges at a strictly slower rate than other robust estimators. The validity of the above rate relies crucially on the independence of the covariates and the errors. In fact, the L2L_2 loss of the LSE can converge arbitrarily slowly when the independence fails. The key technical ingredient is a new multiplier inequality that gives sharp bounds for the `multiplier empirical process' associated with the LSE. We further give an application to the sparse linear regression model with heavy-tailed covariates and errors to demonstrate the scope of this new inequality.Comment: 50 pages, 1 figur

    Better Image Segmentation by Exploiting Dense Semantic Predictions

    Full text link
    It is well accepted that image segmentation can benefit from utilizing multilevel cues. The paper focuses on utilizing the FCNN-based dense semantic predictions in the bottom-up image segmentation, arguing to take semantic cues into account from the very beginning. By this we can avoid merging regions of similar appearance but distinct semantic categories as possible. The semantic inefficiency problem is handled. We also propose a straightforward way to use the contour cues to suppress the noise in multilevel cues, thus to improve the segmentation robustness. The evaluation on the BSDS500 shows that we obtain the competitive region and boundary performance. Furthermore, since all individual regions can be assigned with appropriate semantic labels during the computation, we are capable of extracting the adjusted semantic segmentations. The experiment on Pascal VOC 2012 shows our improvement to the original semantic segmentations which derives directly from the dense predictions

    Complex sampling designs: uniform limit theorems and applications

    Full text link
    In this paper, we develop a general approach to proving global and local uniform limit theorems for the Horvitz-Thompson empirical process arising from complex sampling designs. Global theorems such as Glivenko-Cantelli and Donsker theorems, and local theorems such as local asymptotic modulus and related ratio-type limit theorems are proved for both the Horvitz-Thompson empirical process, and its calibrated version. Limit theorems of other variants and their conditional versions are also established. Our approach reveals an interesting feature: the problem of deriving uniform limit theorems for the Horvitz-Thompson empirical process is essentially no harder than the problem of establishing the corresponding finite-dimensional limit theorems. These global and local uniform limit theorems are then applied to important statistical problems including (i) MM-estimation (ii) ZZ-estimation (iii) frequentist theory of Bayes procedures, all with weighted likelihood, to illustrate their wide applicability.Comment: 46 page

    Robustness of shape-restricted regression estimators: an envelope perspective

    Full text link
    Classical least squares estimators are well-known to be robust with respect to moment assumptions concerning the error distribution in a wide variety of finite-dimensional statistical problems; generally only a second moment assumption is required for least squares estimators to maintain the same rate of convergence that they would satisfy if the errors were assumed to be Gaussian. In this paper, we give a geometric characterization of the robustness of shape-restricted least squares estimators (LSEs) to error distributions with an L2,1L_{2,1} moment, in terms of the `localized envelopes' of the model. This envelope perspective gives a systematic approach to proving oracle inequalities for the LSEs in shape-restricted regression problems in the random design setting, under a minimal L2,1L_{2,1} moment assumption on the errors. The canonical isotonic and convex regression models, and a more challenging additive regression model with shape constraints are studied in detail. Strikingly enough, in the additive model both the adaptation and robustness properties of the LSE can be preserved, up to error distributions with an L2,1L_{2,1} moment, for estimating the shape-constrained proxy of the marginal L2L_2 projection of the true regression function. This holds essentially regardless of whether or not the additive model structure is correctly specified. The new envelope perspective goes beyond shape constrained models. Indeed, at a general level, the localized envelopes give a sharp characterization of the convergence rate of the L2L_2 loss of the LSE between the worst-case rate as suggested by the recent work of the authors [25], and the best possible parametric rate.Comment: 44 pages, 1 figur

    Limit distribution theory for block estimators in multiple isotonic regression

    Full text link
    We study limit distributions for the tuning-free max-min block estimator originally proposed in [FLN17] in the problem of multiple isotonic regression, under both fixed lattice design and random design settings. We show that, if the regression function f0f_0 admits vanishing derivatives up to order Ξ±k\alpha_k along the kk-th dimension (k=1,…,dk=1,\ldots,d) at a fixed point x0∈(0,1)dx_0 \in (0,1)^d, and the errors have variance Οƒ2\sigma^2, then the max-min block estimator f^n\hat{f}_n satisfies \begin{align*} (n_\ast/\sigma^2)^{\frac{1}{2+\sum_{k \in \mathcal{D}_\ast} \alpha_k^{-1}}}\big(\hat{f}_n(x_0)-f_0(x_0)\big)\rightsquigarrow \mathbb{C}(f_0,x_0). \end{align*} Here Dβˆ—,nβˆ—\mathcal{D}_\ast, n_\ast, depending on {Ξ±k}\{\alpha_k\} and the design points, are the set of all `effective dimensions' and the size of `effective samples' that drive the asymptotic limiting distribution, respectively. If furthermore either {Ξ±k}\{\alpha_k\} are relative primes to each other or all mixed derivatives of f0f_0 of certain critical order vanish at x0x_0, then the limiting distribution can be represented as C(f0,x0)=dK(f0,x0)β‹…DΞ±\mathbb{C}(f_0,x_0) =_d K(f_0,x_0) \cdot \mathbb{D}_{\alpha}, where K(f0,x0)K(f_0,x_0) is a constant depending on the local structure of the regression function f0f_0 at x0x_0, and DΞ±\mathbb{D}_{\alpha} is a non-standard limiting distribution generalizing the well-known Chernoff distribution in univariate problems. The above limit theorem is also shown to be optimal both in terms of the local rate of convergence and the dependence on the unknown regression function whenever such dependence is explicit (i.e. K(f0,x0)K(f_0,x_0)), for the full range of {Ξ±k}\{\alpha_k\} in a local asymptotic minimax sense.Comment: 55 page
    • …
    corecore