429 research outputs found
Oracle posterior contraction rates under hierarchical priors
We offer a general Bayes theoretic framework to derive posterior contraction
rates under a hierarchical prior design: the first-step prior serves to assess
the model selection uncertainty, and the second-step prior quantifies the prior
belief on the strength of the signals within the model chosen from the first
step. In particular, we establish non-asymptotic oracle posterior contraction
rates under (i) a local Gaussianity condition on the log likelihood ratio of
the statistical experiment, (ii) a local entropy condition on the
dimensionality of the models, and (iii) a sufficient mass condition on the
second-step prior near the best approximating signal for each model. The
first-step prior can be designed generically. The posterior distribution enjoys
Gaussian tail behavior and therefore the resulting posterior mean also
satisfies an oracle inequality, automatically serving as an adaptive point
estimator in a frequentist sense. Model mis-specification is allowed in these
oracle rates.
The local Gaussianity condition serves as a unified attempt of non-asymptotic
Gaussian quantification of the experiments, and can be easily verified in
various experiments considered in [GvdV07a] and beyond. The general results are
applied in various problems including: (i) trace regression, (ii)
shape-restricted isotonic/convex regression, (iii) high-dimensional partially
linear regression, (iv) covariance matrix estimation in the sparse factor
model, (v) detection of non-smooth polytopal image boundary, and (vi) intensity
estimation in a Poisson point process model. These new results serve either as
theoretical justification of practical prior proposals in the literature, or as
an illustration of the generic construction scheme of a (nearly) minimax
adaptive estimator for a complicated experiment
Set structured global empirical risk minimizers are rate optimal in general dimensions
Entropy integrals are widely used as a powerful empirical process tool to
obtain upper bounds for the rates of convergence of global empirical risk
minimizers (ERMs), in standard settings such as density estimation and
regression. The upper bound for the convergence rates thus obtained typically
matches the minimax lower bound when the entropy integral converges, but admits
a strict gap compared to the lower bound when it diverges. Birg\'e and Massart
[BM93] provided a striking example showing that such a gap is real with the
entropy structure alone: for a variant of the natural H\"older class with low
regularity, the global ERM actually converges at the rate predicted by the
entropy integral that substantially deviates from the lower bound. The
counter-example has spawned a long-standing negative position on the use of
global ERMs in the regime where the entropy integral diverges, as they are
heuristically believed to converge at a sub-optimal rate in a variety of
models.
The present paper demonstrates that this gap can be closed if the models
admit certain degree of `set structures' in addition to the entropy structure.
In other words, the global ERMs in such set structured models will indeed be
rate-optimal, matching the lower bound even when the entropy integral diverges.
The models with set structures we investigate include (i) image and edge
estimation, (ii) binary classification, (iii) multiple isotonic regression,
(iv) -concave density estimation, all in general dimensions when the entropy
integral diverges. Here set structures are interpreted broadly in the sense
that the complexity of the underlying models can be essentially captured by the
size of the empirical process over certain class of measurable sets, for which
matching upper and lower bounds are obtained to facilitate the derivation of
sharp convergence rates for the associated global ERMs.Comment: 42 page
NegCut: Automatic Image Segmentation based on MRF-MAP
Solving the Maximum a Posteriori on Markov Random Field, MRF-MAP, is a
prevailing method in recent interactive image segmentation tools. Although
mathematically explicit in its computational targets, and impressive for the
segmentation quality, MRF-MAP is hard to accomplish without the interactive
information from users. So it is rarely adopted in the automatic style up to
today. In this paper, we present an automatic image segmentation algorithm,
NegCut, based on the approximation to MRF-MAP. First we prove MRF-MAP is
NP-hard when the probabilistic models are unknown, and then present an
approximation function in the form of minimum cuts on graphs with negative
weights. Finally, the binary segmentation is taken from the largest eigenvector
of the target matrix, with a tuned version of the Lanczos eigensolver. It is
shown competitive at the segmentation quality in our experiments.Comment: Since it's an unlucky failure about length-limit violation, I'd like
to save it on arXiv as a record. Any suggestions are welcom
A Simple Unsupervised Color Image Segmentation Method based on MRF-MAP
Color image segmentation is an important topic in the image processing field.
MRF-MAP is often adopted in the unsupervised segmentation methods, but their
performance are far behind recent interactive segmentation tools supervised by
user inputs. Furthermore, the existing related unsupervised methods also suffer
from the low efficiency, and high risk of being trapped in the local optima,
because MRF-MAP is currently solved by iterative frameworks with inaccurate
initial color distribution models. To address these problems, the letter
designs an efficient method to calculate the energy functions approximately in
the non-iteration style, and proposes a new binary segmentation algorithm based
on the slightly tuned Lanczos eigensolver. The experiments demonstrate that the
new algorithm achieves competitive performance compared with two state-of-art
segmentation methods.Comment: Submitted to IEEE SP
Multivariate convex regression: global risk bounds and adaptation
We study the problem of estimating a multivariate convex function defined on
a convex body in a regression setting with random design. We are interested in
optimal rates of convergence under a squared global continuous loss in
the multivariate setting . One crucial fact is that the minimax
risks depend heavily on the shape of the support of the regression function. It
is shown that the global minimax risk is on the order of when
the support is sufficiently smooth, but that the rate is when
the support is a polytope. Such differences in rates are due to difficulties in
estimating the regression function near the boundary of smooth regions.
We then study the natural bounded least squares estimators (BLSE): we show
that the BLSE nearly attains the optimal rates of convergence in low
dimensions, while suffering rate-inefficiency in high dimensions. We show that
the BLSE adapts nearly parametrically to polyhedral functions when the support
is polyhedral in low dimensions by a local entropy method. We also show that
the boundedness constraint cannot be dropped when risk is assessed via
continuous loss.
Given rate sub-optimality of the BLSE in higher dimensions, we further study
rate-efficient adaptive estimation procedures. Two general model selection
methods are developed to provide sieved adaptive estimators (SAE) that achieve
nearly optimal rates of convergence for particular "regular" classes of convex
functions, while maintaining nearly parametric rate-adaptivity to polyhedral
functions in arbitrary dimensions. Interestingly, the uniform boundedness
constraint is unnecessary when risks are measured in discrete norms.Comment: 75 page
Convergence rates of least squares regression estimators with heavy-tailed errors
We study the performance of the Least Squares Estimator (LSE) in a general
nonparametric regression model, when the errors are independent of the
covariates but may only have a -th moment (). In such a
heavy-tailed regression setting, we show that if the model satisfies a standard
`entropy condition' with exponent , then the loss of
the LSE converges at a rate \begin{align*}
\mathcal{O}_{\mathbf{P}}\big(n^{-\frac{1}{2+\alpha}} \vee
n^{-\frac{1}{2}+\frac{1}{2p}}\big). \end{align*} Such a rate cannot be improved
under the entropy condition alone.
This rate quantifies both some positive and negative aspects of the LSE in a
heavy-tailed regression setting. On the positive side, as long as the errors
have moments, the loss of the LSE converges at the
same rate as if the errors are Gaussian. On the negative side, if
, there are (many) hard models at any entropy level for
which the loss of the LSE converges at a strictly slower rate than other
robust estimators.
The validity of the above rate relies crucially on the independence of the
covariates and the errors. In fact, the loss of the LSE can converge
arbitrarily slowly when the independence fails.
The key technical ingredient is a new multiplier inequality that gives sharp
bounds for the `multiplier empirical process' associated with the LSE. We
further give an application to the sparse linear regression model with
heavy-tailed covariates and errors to demonstrate the scope of this new
inequality.Comment: 50 pages, 1 figur
Better Image Segmentation by Exploiting Dense Semantic Predictions
It is well accepted that image segmentation can benefit from utilizing
multilevel cues. The paper focuses on utilizing the FCNN-based dense semantic
predictions in the bottom-up image segmentation, arguing to take semantic cues
into account from the very beginning. By this we can avoid merging regions of
similar appearance but distinct semantic categories as possible. The semantic
inefficiency problem is handled. We also propose a straightforward way to use
the contour cues to suppress the noise in multilevel cues, thus to improve the
segmentation robustness. The evaluation on the BSDS500 shows that we obtain the
competitive region and boundary performance. Furthermore, since all individual
regions can be assigned with appropriate semantic labels during the
computation, we are capable of extracting the adjusted semantic segmentations.
The experiment on Pascal VOC 2012 shows our improvement to the original
semantic segmentations which derives directly from the dense predictions
Complex sampling designs: uniform limit theorems and applications
In this paper, we develop a general approach to proving global and local
uniform limit theorems for the Horvitz-Thompson empirical process arising from
complex sampling designs. Global theorems such as Glivenko-Cantelli and Donsker
theorems, and local theorems such as local asymptotic modulus and related
ratio-type limit theorems are proved for both the Horvitz-Thompson empirical
process, and its calibrated version. Limit theorems of other variants and their
conditional versions are also established. Our approach reveals an interesting
feature: the problem of deriving uniform limit theorems for the
Horvitz-Thompson empirical process is essentially no harder than the problem of
establishing the corresponding finite-dimensional limit theorems. These global
and local uniform limit theorems are then applied to important statistical
problems including (i) -estimation (ii) -estimation (iii) frequentist
theory of Bayes procedures, all with weighted likelihood, to illustrate their
wide applicability.Comment: 46 page
Robustness of shape-restricted regression estimators: an envelope perspective
Classical least squares estimators are well-known to be robust with respect
to moment assumptions concerning the error distribution in a wide variety of
finite-dimensional statistical problems; generally only a second moment
assumption is required for least squares estimators to maintain the same rate
of convergence that they would satisfy if the errors were assumed to be
Gaussian. In this paper, we give a geometric characterization of the robustness
of shape-restricted least squares estimators (LSEs) to error distributions with
an moment, in terms of the `localized envelopes' of the model.
This envelope perspective gives a systematic approach to proving oracle
inequalities for the LSEs in shape-restricted regression problems in the random
design setting, under a minimal moment assumption on the errors. The
canonical isotonic and convex regression models, and a more challenging
additive regression model with shape constraints are studied in detail.
Strikingly enough, in the additive model both the adaptation and robustness
properties of the LSE can be preserved, up to error distributions with an
moment, for estimating the shape-constrained proxy of the marginal
projection of the true regression function. This holds essentially
regardless of whether or not the additive model structure is correctly
specified.
The new envelope perspective goes beyond shape constrained models. Indeed, at
a general level, the localized envelopes give a sharp characterization of the
convergence rate of the loss of the LSE between the worst-case rate as
suggested by the recent work of the authors [25], and the best possible
parametric rate.Comment: 44 pages, 1 figur
Limit distribution theory for block estimators in multiple isotonic regression
We study limit distributions for the tuning-free max-min block estimator
originally proposed in [FLN17] in the problem of multiple isotonic regression,
under both fixed lattice design and random design settings. We show that, if
the regression function admits vanishing derivatives up to order
along the -th dimension () at a fixed point , and the errors have variance , then the max-min block
estimator satisfies \begin{align*}
(n_\ast/\sigma^2)^{\frac{1}{2+\sum_{k \in \mathcal{D}_\ast}
\alpha_k^{-1}}}\big(\hat{f}_n(x_0)-f_0(x_0)\big)\rightsquigarrow
\mathbb{C}(f_0,x_0). \end{align*} Here , depending on
and the design points, are the set of all `effective dimensions'
and the size of `effective samples' that drive the asymptotic limiting
distribution, respectively. If furthermore either are relative
primes to each other or all mixed derivatives of of certain critical
order vanish at , then the limiting distribution can be represented as
, where
is a constant depending on the local structure of the regression
function at , and is a non-standard limiting
distribution generalizing the well-known Chernoff distribution in univariate
problems. The above limit theorem is also shown to be optimal both in terms of
the local rate of convergence and the dependence on the unknown regression
function whenever such dependence is explicit (i.e. ), for the full
range of in a local asymptotic minimax sense.Comment: 55 page
- β¦